Methods and apparatus for cooling electronics

ABSTRACT

Methods and apparatus are provided for choosing an energy-efficient coolant temperature for electronics by considering the temperature dependence of the electronics&#39; power dissipation. This dependence is explicitly considered in selecting the coolant temperature T 0  that is sent to the equipment. To minimize power consumption P Total  for the entire system, where P Total =P 0 +P Cool  is the sum of the electronic equipment&#39;s power consumption P 0  plus the cooling equipment&#39;s power consumption P Cool , P Total  is obtained experimentally, by measuring P 0  and P Cool , as a function of three parameters: coolant temperature T 0 ; weather-related temperature T 3  that affects the performance of free-cooling equipment; and computational state C of the electronic equipment, which affects the temperature dependence of its power consumption. This experiment provides, for each possible combination of T 3  and C, the value T* 0  of T 0  that minimizes P Total . During operation, for any combination of T 3  and C that occurs, the corresponding optimal coolant temperature T* 0  is selected, and the cooling equipment is commanded to produce it.

This invention was made with Government support under Contract No. B554331 awarded by the Department of Energy. The Government has certain rights in this invention.

The present invention is related to the cooling of electronics, such as computer equipment in a data center. More specifically, this invention is related to methods and apparatus for optimizing the energy efficiency of cooling such electronic equipment when the temperature dependence of the electronics' power dissipation is considered in conjunction with the power dissipated in cooling equipment such as a chiller. In particular, when a chiller is used in conjunction with a free-cooling heat exchanger, the invention provides apparatus and methods for choosing the temperature of coolant provided to the electronics that minimizes the overall power consumption of the system.

BACKGROUND

In a manner similar to an electric light bulb that gives off heat when it is powered on, electronics such as computer equipment dissipate electrical power as heat. In many cases, to prevent the electronics from overheating, that heat must be removed by a cooling system, typically using a cooling fluid such as air or water that is passed through the electronics. The current invention applies to all types of cooling fluids and many types of electronic equipment, but for specificity, consider the example of a liquid-cooled computer. Referring to the traditional system 100 shown in FIG. 1, electrical power is consumed by liquid-cooled computers 102 in a machine-room 104. The computers' electrical power is dissipated as an arbitrary number of power consumptions (or heat loads) P_(A), P_(B), . . . , P_(X), which are removed to the outside world in several steps.

First, the power consumption of the electronics (or aggregate machine-room heat load)

P ₀ ≡P _(A) +P _(B) + . . . +P _(X)  (1)

is transferred from the computers to a liquid coolant 106, which enters each computer at a cold temperature T₀, and flows under pressure created by pump 108 in a closed loop of liquid-coolant pipes. The pump 108 creates an additional power consumption (or heat load) P₁.

Second, in chiller 110, the combined heat load P₀+P₁ is transferred from the liquid coolant 106 to a refrigerant 112, which flows in a closed loop of refrigerant pipes, wherein evaporation of the refrigerant 112 occurs during its absorption of heat load P₀+P₁, and compression of the refrigerant occurs in a compressor 114. The compressor 114 creates an additional heat load α(P₀+P₁) that is proportional to the incident heat load P₀+P₁. The proportionality factor α is a characteristic of the compressor 114. The temperature T₀ entering computers 102 is maintained by feedback circuitry comprising a temperature-measurement device 116 and a processing device 118 that compares the measured value T₀ to a user-defined set-point temperature (T₀)_(Set), and commands the chiller 110 to drive the difference T₀−(T₀)_(Set) to zero by modulation of the compressor 114.

Third, heat load (1+α)(P₀+P₁) is transferred from the refrigerant 112 to condenser water 120, which flows under pressure created by pump 122 in a closed loop of condenser-water pipes, with the transfer of heat (1+α)(P₀+P₁) causing condensation of the refrigerant 112. The pump 122 creates an additional heat load P₂.

Fourth, by means of a cooling tower 124, which typically contains an air mover 126 that produces an additional heat load P₃, heat load (1+α)(P₀+P₁)+P₂+P₃ is transferred from the condenser water 120 to the outside air.

The “useful” electrical power consumed in the machine room 104 by computers 102 is P₀, whereas the “overhead” power consumed by the cooling equipment in system 100 is, by inspection of FIG. 1,

P _(Cool) ⁽¹⁰⁰⁾ ≡P ₁+α(P ₀ +P ₁)+P ₂ +P ₃.  (2)

Thus, the total electrical power consumed by system 100 is

P _(Total) ⁽¹⁰⁰⁾ =P ₀ +P _(Cool) ⁽¹⁰⁰⁾.  (3)

In equations (2) and (3), superscript “(100)” indicates that the symbols apply to system 100.

The dominant term on the right-hand side of equation (2) is the compressor power α(P₀+P_(i)), where the chiller-overhead fraction α is typically 0.10 to 0.20. Consequently, to save some or all of compressor power a (P₀+p), the concept of “free cooling” has been developed in prior art, as described, for example, in “Free Cooling Using Water Economizers”, by Susanna Hanson and Jeanne Harshaw, TRANE Engineers Newsletter, Volume 37-3, September 2008, which is incorporated herein in its entirety by reference. It is worth noting that in the Unites States, the overhead fraction α is often expressed in kW/ton, where a ton of cooling is 3.517 kW. Thus, for example, 0.53 kW/ton corresponds to the dimensionless value α=0.53/3.517=0.15.

Referring to FIG. 2, a typical “free cooling” system 200 is to some extent similar to the traditional system 100; namely, callouts 202 through 226 of system 200, shown on FIG. 2, are exactly analogous, respectively, to callouts 102 through 126 of system 100, shown on FIG. 1. However, system 200 is distinguished from system 100 by the addition of a free-cooling heat exchanger 230 to the loop of liquid-coolant loop 206, which allows the liquid coolant 206 to reject some of its heat load P₀+P₁ to free-cooling water 232. Let

β≡Fraction of heat load P₀+P₁ that coolant 206 rejects to free-cooling water 232 via the free-cooling heat exchanger 230.  (4)

Free-cooling water 232 flows under pressure created by a pump 234 in a closed loop of pipes. The pump 234 creates an additional heat load P₆. By means of a cooling tower 238, which typically contains an air mover 240 that produces additional heat load P₇, a heat load β(P₀+P₁)+P₆+P₇ is transferred from the free-cooling water 232 to outside air.

System 200 is further distinguished from system 100 by the addition of feedback circuitry comprising a device 242 to measure temperature T₃ and to communicate this measurement to processing device 218. System 200 is further distinguished by the addition of electrical feedback from processing device 218 to pump 234 to enable speed modulation of the pump is cases where T₃<T₀, so that the liquid coolant does not become too cold, and also to enable powering off the pump in cases where T₃>T₁, to prevent undesired heating of the liquid coolant as it passes through the free-cooling heat exchanger 230.

Because of the heat-load rejection β(P₀+P₁) from liquid coolant 206 by means of heat exchanger 230, the amount of remaining heat to be rejected to the refrigerant 212 in system 200 is (1−β)(P₀+P₁). Thus, the incident heat load on the compressor 214 in system 200 is a factor of 1−β smaller than that on the compressor 114 in system 100. Because we assume the compressors 114 and 214 to be otherwise identical, the power consumption of compressor 214 in system 200 is likewise reduced by the factor 1−β compared to compressor 114 in system 100. That is, system 200's compressor 214 consumes only (1−β)α(P₀+P₁).

In comparing systems 100 and 200 to assess how much power is saved by “free cooling”, the useful electrical power P₀ consumed in the machine room 204 of system 200 is naturally assumed to be the same as that consumed in the machine room 104 of system 100. This assumption merely states that, to make a fair comparison, the computers 202 in system 200 are identical to the computers 102 in system 100, and are performing identical computations.

The “overhead” power consumed by cooling equipment in system 200 is

P _(Cool) ⁽²⁰⁰⁾ ≡P ₁+(1−β)α(P ₀ +P ₁)+P ₄ +P ₅ +P ₆ +P ₇.  (5)

Comparing equations (2) and (5) yields the power-saving advantage of system 200 over system 100:

ΔP≡P _(Cool) ⁽¹⁰⁰⁾ −P _(Cool) ⁽²⁰⁰⁾=βα(P ₀ +P ₁)+(P ₂ +P ₃)−(P ₄ +P ₅ +P ₆ +P ₇)  (6)

The power consumed by pumps (P₁, P₄, P₆) and air movers (P₅, P₇) is typically small compared to that consumed by the refrigerant-loop compressor, so the first term on the right-hand side of equation (6), βα(P₀+P₁), is the dominant teem. Moreover, if the pumps 122, 222, and 234 as well as the cooling-tower air movers 126, 226, and 240 are controlled so that power consumed is proportional to incident heat load, then

P ₄=(1−β)P ₂ ; P ₆ =βP ₂  (7)

and

P ₅=(1−↑)P ₃ ; P ₇ =βP ₃.  (8)

Assuming this type of control, by combination of equations (7) and (8),

P ₄ +P ₅ +P ₆ +P ₇ =P ₂ ±P ₃,  (9)

whence, substituting equation (9) into equation (6), the second the third terms on the right-hand side of equation (6) disappear, and the power saving from free cooling becomes simply

ΔP≡P _(Cool) ⁽¹⁰⁰⁾ −P _(Cool) ⁽²⁰⁰⁾=_(βα() P ₀ +P ₁).  (10)

Equation (10) implies that to maximize the amount of saved power ΔP, β=1 is desired, whereby the entire incident heat load P₀+P₁ on free-cooling heat exchanger 230 is rejected thereby. In such a scenario, the chiller 210, pump 222, and cooler tower 126 may be turned off. In fact, if β=1 can be achieved at all times, then the chiller 212, pump 222, and cooling tower 224 are superfluous and need not be purchased. This is the most aggressive objective of the free-cooling paradigm.

Still referring to FIG. 2, this aggressive objective can typically only be achieved, unfortunately, by permitting the coolant 206 that enters computers 202 to have a high temperature T₀. In general, from energy arguments, the relation between β and coolant temperatures is

$\begin{matrix} {\beta = {\frac{T_{1} - T_{2}}{T_{1} - T_{0}}.}} & (11) \end{matrix}$

Consequently, if the chiller 210 is turned off or absent, then β=1, so, according to equation (11), T₂=T₀. But T₂ is weather dependent, because the temperature T₃ of water returning from the cooling tower 238 depends on the wet-bulb temperature T_(WB) of ambient outside air, which depends on geographical location and season. The wet bulb temperature T_(WB) currently never exceeds 31° C. anywhere on earth, but may be higher in the future due to climate change, as reported by Steven C. Sherwood and Matthew Huber in “An adaptability limit to climate change due to heat stress”, Proceedings of the National Academy of Sciences, May 2010, (0913352107), which is included herein in its entirety by reference. Temperature T₃ exceeds T_(WB) by an amount ΔT_(CT), sometimes called the “cooling-tower approach temperature”, which is a function of several variables (see the aforementioned TRANE Engineers Newsletter) but typically in the range of 1 to 5° C. Thus,

T ₃ =T _(WB) +ΔT _(CT).  (12)

Moreover, temperature T₂ exceeds T₃ by an amount ΔT_(HX), sometimes called the “heat-exchanger approach temperature”, which is typically in the range of 1 to 2° C. Thus,

T ₂ =T _(WB) +ΔT _(CT) +ΔT _(HX).  (13)

Consequently, to achieve year-round free cooling anywhere on earth under current climate conditions, the water-inlet temperature T₀ to computers 202 may need to be as high as

(T ₀)_(max≡() T _(WB))_(max+(Δ) T _(CT))_(max+(Δ) T _(HX))_(max)=31+5+2=38° C.  (14)

In contrast,

(T ₀)_(min)≡16° C.  (15)

is just warm enough to avoid condensation of air-borne moisture for “Class 1” machine-room conditions, as specified by the American Society of Heating, Air-Conditioning, and Refrigeration Engineers (ASHRAE) in “Thermal Guidelines for Data Processing Environments, 2^(nd) edition”, ISBN 978-1-933742-46-5, which in included herein in its entirety by reference. Specifically, ASHRAE's “Recommended” Class 1 envelope in psychrometric space is bounded above by a 15° C. dew-point, which implies, with a 1° C. margin of safety, that T₀=16° C. is the minimum safe temperature at which air-borne water is guaranteed not to condense inside the computers 202.

Less aggressive than equation (14), a more typical free-cooling option is to choose a moderate set-point value of T₀, and to retain the chiller 210, pump 222, and cooling tower 224 so that, when weather precludes the free-cooled temperature T₂ from reaching the desired set-point T₂=(T₀)_(Set), the chiller can make up the difference. Even in such systems, with a chiller present as in system 200, users of computers and other electronic equipment urge manufacturers to design their equipment to allow high inlet temperature T₀ so that, despite adverse weather conditions, the fraction β of the heat load removed by the heat exchanger 230 remains high, and thus the power-saving ΔP, given by equation (10), remains large. This strategy says that maximizing the amount of free cooling is always better—even if it means higher water inlet temperature T₀ to the computers 202.

The current invention calls this strategy into question, purely on the basis of power savings. It explains why maximizing the amount of free cooling, regardless of T₀, does not necessarily save energy, but may actually waste energy. Based on this insight, the invention provides, for a system like system 200, an innovative method and apparatus to determine the value of T₀ that actually provides the greatest conservation of energy. Depending on the weather-related temperature 7; and the computational state C, the power-optimal solution may be for the chiller 310 to provide some or even all of the cooling, despite the presence of the free-cooling heat exchanger 330.

SUMMARY OF THE INVENTION

For computers and other electronic equipment, power dissipation depends on temperature. In particular, for CMOS (Complementary Metal-Oxide-Semiconductor) circuits, power is an increasing function of junction temperature T_(j) of CMOS transistors, due to a component of leakage power called sub-threshold leakage, which depends exponentially on temperature, as described, for example, in Leakage in Nanometer CMOS Technologies, edited by Siva G. Narenda and Anantha Chandrakasan, published by Springer, 2010, ISBN-13 978-1-4419-3826-8, which is included herein in its entirety by reference. As a result, power dissipation in CMOS circuits is an increasing function of the inlet temperature T₀ of the coolant, because increasing T₀ by ΔT also increases T_(j) directly by ΔT, and even slightly more, because the additional sub-threshold leakage power attending the higher T_(j) increases the cooling load, leading to a small additional temperature burden through the cooling path.

Combining the understanding that power dissipation depends on temperature with the foregoing discussion relating to FIGS. 1 and 2, it appears that power conservation in electronics is antithetical to power conservation in the associated cooling system. On the one hand, to conserve power in the electronics, the inlet temperature T₀ of the coolant should always be as low as possible; namely T₀=(T₀)_(min), where (T₀ L_(in) is specified by equation (15). On the other hand, as explained in the foregoing discussion, to conserve power in the cooling system despite weather, T₀ should be allowed to be as high as possible, so that free-cooling may be fully used, year-round, in all climates.

According to this invention, the above dichotomy suggests optimization: by striking a balance between the needs of the electronics and the capabilities of the cooling system, it is possible to find an optimum temperature T*₀ that minimizes the aggregate power consumption of the electronic equipment and the cooling equipment combined. The invention specifies methods and apparatus to choose, for a given system, the optimum value T*₀ of the set-point temperature (T₀)_(Set), despite the fact that this optimum is a function of the weather as well as the computational work-load of the computers 202. Further, the invention shows by means of a mathematical model that, for certain conditions, the optimum temperature T*₀ is likely to be lower than what is suggested by prior-art that focuses solely on P_(Cool), but ignores the temperature dependence of P₀.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic view of a prior-art apparatus comprising a machine room and conventional cooling equipment.

FIG. 2 illustrates a schematic view of a prior-art apparatus comprising a machine room with conventional cooling equipment that is enhanced with free-cooling equipment.

FIG. 3 illustrates a schematic view of an apparatus according to a first embodiment of the invention, including a coolant-temperature selecting device that minimizes total power consumption P_(Total), which is the sum of electronics' power consumption P₀ and cooling-equipment power consumption P_(Cool).

FIG. 4 illustrates pseudo-code of a calibration algorithm for the first embodiment of the invention.

FIG. 5 illustrates a schematic view of an apparatus according to a second embodiment of the invention, including an alternative coolant-temperature selecting device that minimizes total power consumption P_(Total), which is the sum of electronics' power consumption P₀ and cooling-equipment power consumption P_(Cool).

FIG. 6 illustrates a calibration algorithm for the second embodiment of the invention.

FIG. 7 illustrates two examples an operational algorithm for the second embodiment of the invention.

FIG. 8 illustrates the typical thermal path between a heat-producing electronic device and a liquid coolant.

FIG. 9 plots the parameter κ vs. CMOS transistor gate length, where κ is a measure of how strongly the power consumption of the transistor depends on temperature.

FIG. 10 illustrates consumed power vs. temperature for a typical, modern computer system, which is used experimentally to quantify the potential advantages of the invention.

FIG. 11 tabulates various parameters used in a mathematical model that quantifies the advantages of the invention.

FIG. 12 illustrates consumed power vs. temperature for three computational states of the computer used to test the advantages of the invention.

FIG. 13 illustrates the data given in FIG. 12 renormalized.

FIG. 14 illustrates the dependence of a reference-power parameter on the computational-state parameter λ.

FIG. 15 illustrates the dependence of a second reference-power parameter on the computational-state parameter λ.

FIGS. 16 through 24 illustrate results of the mathematical model for differing values of T₃, each Figure plotting total consumed power vs. coolant temperature T₀ and the computational-state parameter λ.

FIG. 25, derived from FIGS. 16-24, illustrates a comparison of the minimum power consumed using 100% free cooling vs. the power consumed at a low coolant temperature in the absence of free cooling, as a function of the computational-state parameter λ and the weather-related temperature T₃.

FIG. 26, also derived from FIGS. 16-24, plots the optimal, P_(Total)-minimizing coolant temperature as a function of the computational-state parameter λ and the weather-related temperature T₃.

FIG. 27, derived from FIG. 26, plots in (T₃, λ) space the locus of points, denoted the “downshift boundary”, where the power-minimizing optimal solution for coolant temperature suddenly shifts from a high value to a low value, thereby dividing the space into two regions, one where 100% free cooling is optimal, and the other where 100% chiller cooling is optimal.

FIG. 28 is a table describing how various parameters are varied on FIGS. 29-32.

FIG. 29 is similar to FIG. 27, except that parameter κ takes on several values, thereby showing how the downshift boundary depends on κ.

FIG. 30 is similar to FIG. 27, except that parameter

takes on several values, thereby showing how the downshift boundary depends on

.

FIG. 31 is similar to FIG. 27, except that a dimensionless grouping of parameters,

$\frac{\rho \; {cV}}{UA},$

takes on several values, thereby showing how the downshift boundary depends on

$\frac{\rho \; {cV}}{UA}.$

FIG. 32 is similar to FIG. 27, except that parameter α takes on several values, thereby showing how the downshift boundary depends on α.

FIG. 33 illustrates pseudo-code similar to FIG. 4, except that the algorithm is simplified in light of the mathematical model.

FIG. 34 illustrates pseudo-code similar to FIG. 6, except that the algorithm is simplified in light of the mathematical model.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

This invention exploits the fact that the power consumption of computers increases when the temperature T₀ of the cooling fluid increases, whereas the power consumption of the associated cooling equipment decreases as T₀ increases, because larger T₀ allows for more “free cooling”. This dichotomy suggests that, for given conditions, an optimum, power-minimizing value of T₀ exists, denoted T₀*, a value that will minimize overall energy consumption. The invention specifies how this optimum may be found as a function of weather conditions and the computational state of the computers being cooled.

Referring to FIG. 3, a first embodiment of this invention specifies a system 300, which is a modified form of the prior-art system 200. In particular, elements 302 through 342 of system 300, shown on FIG. 3, are exactly analogous, respectively, to elements 202 through 242 of system 200, shown on FIG. 2. However, system 300 is distinguished from system 200 by the addition of a number of elements, described presently.

The total power consumption P_(Total) for system 300 is the sum of two terms, as follows,

P _(Total)(T ₀ ,T ₃ ,C)=P ₀(T ₀ ,C)+P _(Cool)(T ₀ ,T ₃ ,C),  (16)

where the parenthetical lists in equation (16) denote parameters upon which the various components of power depend, and the following definitions apply:

-   P₀=Power consumption of the computers 302; -   C=Computational state of the computers 302. For example, one state     may be “idle”, another may be “running program XYZ”, another may be     “50% loaded”, etc. The set of states C may be defined in many ways,     and will depend on the type of computer involved; -   T₀=Coolant temperature at entrance to the computers 302, as selected     by (T₀)_(Set);

P _(Cool) ≡P ₁+(1−β)α(P ₀ +P ₁)+P ₄ +P ₅ +P ₆ +P ₇

-   -   (the sum of the power dissipated by all cooling equipment in         system 300);

$\begin{matrix} {T_{3} \equiv {{Temperature}\mspace{14mu} {at}\mspace{14mu} {which}\mspace{14mu} {cold}\mspace{14mu} {water}\mspace{14mu} {from}\mspace{14mu} {the}\mspace{14mu} {cooling}\mspace{14mu} {tower}}} \\ {{338\mspace{14mu} {enters}\mspace{14mu} {heat}\mspace{14mu} {exchanger}\mspace{14mu} 330}} \\ {\equiv {T_{WB} + {\Delta \; {T_{CT}.}}}} \end{matrix}$

In the latter equality, T_(WB) is the wet-bulb temperature at cooling tower 338, and ΔT_(CT) is the cooling-tower “approach temperature”, as discussed in connection with equation (12). Thus T₃ is dependent on weather.

The optimal value T*₀ of the coolant inlet temperature T₀ is the value that minimizes the total power P_(Total) in equation (16). Thus, T*₀ is a function of the weather-related temperature T₃ and the computational state C. With the goal of determining this functional dependence T*₀(T₃, C), system 300 comprises several elements missing from the prior-art system 200.

First, system 300 comprises a storage device 344 in which measured and computed data may be stored and retrieved by processing device 318.

Second, system 300 comprises a power-measurement device 346 that measures the aggregate electrical power P₀ being consumed by all the computers 302 in machine room 304.

Third, system 300 comprises one or more power-measurement devices 348 that collectively measure the aggregate electrical power P_(Cool) consumed by all the cooling equipment in system 300:

P _(Cool) ≡P ₁(1−β)α(P ₀ +P ₁)+P ₄ +P ₅ +P ₆ +P ₇.  (17)

If measuring all terms of P_(Cool) is too expensive, a possible (though inexact) alternative is to measure chiller power only, P_(Cool)≈(1−β)α(P₀+P₁), because chiller power is typically the dominant term on the right-hand side of equation (5). In either case, the power measurements are saved in the storage device 344.

Fourth, referring to FIG. 4 as well as FIG. 3, system 300 comprises a calibration algorithm 400, written in FIG. 4 as pseudo-code, which is stored in the storage device 344 and executed by the processing device 318 when the system 300 is first started. Algorithm 400 finds the array T₀*(T₃, C) experimentally.

Lines (402) and (404) of calibration algorithm 400 theoretically execute a loop on T₃, starting at a minimum valve (T₃)_(min) and proceeding stepwise to a maximum value (T₃)_(max). The loop variable corresponding to T₃ is denoted i. This representation of the loop on T₃ as a neat, orderly progression from (T₃)_(min) to (T₃)_(max) in equal steps is theoretical rather than actual. In reality, because T₃ depends on weather, which cannot be algorithmically dictated, iterations of this loop may actually occur in random order, and at weather-dependent values of T₃, over an extended time after system 300 is first installed: when the weather changes to produce a value of T₃ substantially different from values for which algorithm 400 has already been run, an additional iteration of this loop is executed. It may take some time (a year or longer) for T₃ to run its slowly-varying, seasonal course from extreme to extreme, so iterations of the “for loop” represented by line 402 of algorithm 400 may actually be executed sporadically over an extended period. Thus, in a practical implementation of algorithm 400, line 404 will cause sensor 342 to read temperature T₃, and the remainder of the iteration of the “for i” loop will proceed if and only if a similar value of T₃ has never been encountered in previously iterations of the loop. Such a strategy prevents needless, redundant experimentation.

Lines (406) and (408) of calibration algorithm 400 execute a loop on “C”, the variable representing the computational state of the computers 302 in machine room 304. The precise definition of the “computational state” C_(j) depends on the types of computers 302 contained in machine-room 304, as well as the types of computational tasks assigned to them. For the embodiment of the invention that utilizes calibration algorithm 400, it is assumed that:

-   -   i. an array of well-defined computational states C_(j) of         computers 302 can be identified a priori;     -   ii. the computers 302 can be commanded by processing device 318         to execute one of these states;     -   iii. computational state C_(j) persists long enough to perform         iteration j of the loop represented by lines (406) through (434)         of calibration algorithm 400;     -   iv. the power consumption P₀ of each state, although a function         of time t denoted P₀(t) that is not necessarily constant,         nevertheless has the property that the time-averaged power         consumption

$\begin{matrix} {{\overset{\_}{P}}_{0} \equiv {\int_{t = 0}^{\tau}{P_{0}(t)}}} & (18) \end{matrix}$

-   -   -   is substantially constant if the averaging is done over a             sufficiently long time τ, and

    -   v. the time allocated in calibration algorithm 400 for measuring         P₀(t) to obtain the average P ₀ is at least τ.

Line (408) of calibration algorithm 400 directs the computers 302 to execute the computational task designated by C_(j). For most computers, one of these computational tasks, denoted C₀, will be “do nothing”, in which the computers are powered on but idle. Computers often spend considerable time in the idle state. For CMOS-based computers in the idle state, sub-threshold leakage, which is exponentially dependent on absolute temperature, is the dominant form of leakage, and may comprise a major portion of power dissipation. Consequently, for the idle state, providing a low coolant-inlet temperature T₀ is likely to save considerably more power in the computers 302 than it costs in the cooling equipment, even if weather forces the chiller 310 to be used to achieve the lower T₀. For other computational tasks in CMOS-based computers, where temperature-independent transistor-switching power may dominate P₀, the trade-off between P₀ and P_(Cool) is complex, and minimization of P_(Total) is best done experimentally; for example, in the manner prescribed by calibration algorithm 400.

Line (410) of calibration algorithm 400 initializes, to the value (T₀)_(min) specified by equation (15), each element of a two-dimensional array T*₀(i, j) that will ultimately hold the optimum values of T₀.

Line (412) of calibration algorithm 400 initializes the variable P*, which is defined mathematically as

$\begin{matrix} {{{P^{*} \equiv {\min\limits_{T_{0}}\left( {{\overset{\_}{P}}_{0} + {\overset{\_}{P}}_{Cool}} \right)}}}_{T_{3},C},} & (19) \end{matrix}$

where P ₀ and P _(Cool) are time-averaged values of P₀ and P_(Cool) respectively. That is, over the range of values of coolant-inlet temperature T₀ to be tested, P* is the minimum total electrical power consumed for the current values, (T₃)_(i) and C_(j), of the loop parameters T₃ and C. Initialization of P* to the huge number 9E99, a common programming artifice, insures that, in the subsequent search for minimum power, the initialized value exceeds the first actual value, for reasons described presently.

Line (414) of calibration algorithm 400 executes a loop on the set-point coolant-inlet temperature (T₀)_(set), from a minimum of (T₀)_(min) to a maximum of (T₀)_(max-allowed), in increments of ΔT₀, where (T₀)_(min) is given by equation (15), (T₀)_(max-allowed) is the maximum coolant-inlet temperature allowed by computers 302, and ΔT₀ is chosen at will, but is perhaps 0.5 or 1.0° C.

Line (416) of calibration algorithm 400 pauses program execution while the system comes to thermal equilibrium, such that the measured temperature (T₀)_(meas) is within a tolerance ε of the set-point value (T₀)_(set).

Line (418) of calibration algorithm 400 performs a first reading process to obtain the time-averaged power P ₀ consumed by the computers 302 for the current value of the set-point coolant temperature (T₀)_(set). In this first reading process, samples of the time-varying function P₀(t) are measured by the power-measurement device 346, reported to the processing device 318, and the time-averaged power P ₀ is computed according to equation (18).

Line (420) of calibration algorithm 400 performs a second reading process to obtain the time-averaged power P _(Cool) consumed by the cooling equipment for the current value of coolant temperature T₀. In this second reading process, samples of the time-varying function P_(Cool)(t) are measured by the power measurement device 348, reported to the processing device 318, and the time-averaged power

$\begin{matrix} {{\overset{\_}{P}}_{Cool} \equiv {\int_{t = 0}^{\tau}{P_{Cool}(t)}}} & (20) \end{matrix}$

is computed. Time averaging is expected to be less important for P_(Cool)(t) than for P₀(t) because P_(Cool)(t) is typically slowly varying. Nevertheless, the averaging process (20) makes the measurement more robust.

Line (422) of calibration algorithm 400 sums the time-average computer power consumption P ₀ and the time-averaged cooling-equipment power consumption P _(Cool), thereby computing the total time-averaged power consumption P _(Total) of system 300 for the current values of (T₀)_(set), T₃, and C.

Line (424) of calibration algorithm 400 tests, for the current values of T₃ and C, namely T₃[i] and C[j], whether the power consumption P _(Total) found for the current value of set-point coolant temperature (T₀)_(set) is less than the current value of P*. If so, line (426) assigns the current time-averaged power consumption P _(Total) as the new minimum P* for this (i, j), and line (428) assigns the current value of (T₀)_(set) as the new optimum temperature T*₀ for this (i, j), saving this value in the a two-dimensional array T*₀[i][j]. Consequently, on the first iteration of the loop on (T₀)_(set), line (426) is guaranteed to be executed because of the initialization P*=9E99 at line (412). On subsequent iterations of this loop, line (426) is executed if and only if P _(Total) found for the current value of (T₀)_(set) is less than any value found for all previously tested values of (T₀)_(set). Thus, when the loop on (T₀)_(set) completes, T₀*[i][j] holds the value of coolant temperature T₀ that minimizes total power P_(Total) for the current computational task C_(j) and the current value of the weather-dependent temperature T₃.

The net result of calibration algorithm (400) is the two-dimensional array T₀*[i][j], which is stored as a look-up table in the storage device 344, and which specifies the optimal coolant temperature that should be used for various combinations of weather conditions and computational tasks. The index i refers to the value of T₃ stored in the i^(th) element of the one-dimensional array T₃[i], which is also stored in storage device 344. Likewise the index j refers to the i^(th) element of the one-dimensional array of computational states C[j], which is also stored in storage device 344 in a way that can be usefully identified later. For example, C[0] may hold the string “idle”, indicating for later usage that C[0] refers to the idle state of the computers 302.

After calibration algorithm 400 has been run to characterize system 300, during subsequent operation of the machine room 304, an operational algorithm 450 is executed by processing device 318. The operational algorithm 450 comprises the following steps:

-   -   Read the current computational state C from computers 302, and         determine from the stored array C[j] the associated value of         index j.     -   Read from sensor 342 the current value of temperature T₃, and         determine the two values in the look-up table, say T₃[i] and         T₃[i+1] that bracket the measured value T₃.     -   Compute the power-minimizing value of coolant temperature,         denoted T*₀(T₃, C), by interpolation along the i dimension of         the array T*₀[i][j] retrieved from storage device 344. That is,         the interpolated optimum value of T₀ for arbitrary measured         values of T₃ is computed as follows:

$\begin{matrix} {{T_{0}^{*}\left( {T_{3},C_{j}} \right)} = {{{T_{0}^{*}\lbrack i\rbrack}\lbrack j\rbrack} + {\frac{T_{3} - {T_{3}\lbrack i\rbrack}}{{T_{3}\left\lbrack {i + 1} \right\rbrack} - {T_{3}\lbrack i\rbrack}}\left( {{{T_{0}^{*}\left\lbrack {i + 1} \right\rbrack}\lbrack j\rbrack} - {{T_{0}^{*}\lbrack i\rbrack}\lbrack j\rbrack}} \right)}}} & (21) \end{matrix}$

It should be noted that equation (21) has the correct limiting behavior:

if T ₃ =T ₃ [i]:T* ₀(T ₃ ,C _(j))=T* ₀ [i][j]

if T ₃ =T ₃ [i+1]:T* ₀(T ₃ ,C _(j))=T* ₀ [i+1][j]  (22)

-   -   Set the (T₀)_(Set) register to the value T*₀(T₃, C) given by         equation (21). By virtue of the feedback circuitry previously         discussed in connection with FIG. 1, T₀ is controlled to the         set-point temperature (T₀)_(Set). Thus, in accordance with this         invention, the optimal coolant temperature T*₀ is provided which         achieves the most energy-efficient operation of system 300 as a         whole, over a range of weather conditions and computational         states.

Referring now to FIG. 5, a second embodiment of the invention specifies a system 500, which is in many respects identical to system 300, but which differs from system 300 by elimination of the following elements:

-   -   i. The calibration algorithm 400. As discussed previously,         iterations of the outer loop of calibration algorithm 400,         because they depend on weather, may only execute sporadically,         and it may thus take a long time to fill in the look-up table         T₀*[i][j]. This is inconvenient in practice. Moreover,         calibration algorithm 400 assumes that computational states C[j]         are identifiable a priori, and are executable on demand at the         pleasure of the calibration process. These assumptions may be         false in practice.     -   ii. The need to store the arrays T_(WB)[i], C[j], and T*₀[i][j]         in storage device 544. This follows from (i) because these         arrays are produced by the calibration algorithm. Consequently,         on FIG. 5 (in contrast to FIG. 3), these arrays have been erased         from the block representing storage device 544.     -   iii. The need for identification of computational states C.         Consequently, all references to computational states C have been         eliminated from FIG. 5.

Rather, system 500 merely assumes that during operation of the computers 302 in machine room 304, the time-averaged powers P ₀ and P _(Cool) specified by equations (18) and (20) respectively are slowly varying compared to the time required to perform an operational algorithm 600 shown on FIG. 6.

Line (610) of operational algorithm 600, analogous to line (410) of calibration algorithm (400), initializes the variable T*₀ to the value (T₀)_(min). T*₀ will ultimately hold the optimum value of T₀ for current weather conditions and computational status of computers 502.

Line (612) of operational calibration algorithm 600, analogous to line (412) of calibration algorithm (400), initializes the variable P*, which is defined mathematically as

$\begin{matrix} {P^{*} \equiv {\min\limits_{T_{0}}{\left( {{\overset{\_}{P}}_{0} + {\overset{\_}{P}}_{Cool}} \right).}}} & (23) \end{matrix}$

That is, over the range of values of coolant-inlet temperature T₀ to be tested, P* is the minimum total electrical power consumed under current conditions. Initialization of P* to the huge number 9E99, a common programming artifice, insures that, in the subsequent search for minimum power, the initialized value exceeds the first actual value, for reasons described presently.

Line (614) of operational algorithm 600, analogous to line (414) of calibration algorithm 400, executes a loop on the set-point coolant-inlet temperature (T₀)_(set), from a minimum of (T₀)_(min) to a maximum of (T₀)_(max), in increments of ΔT₀, where (T₀) is given by equation (15), (T₀)_(max-allowed) is the maximum coolant-inlet temperature allowed by computers 302, and ΔT₀ is chosen at will, but is perhaps 0.5 or 1.0° C.

Line (616) of operational algorithm 600, analogous to line (416) of calibration algorithm 400, pauses program execution while the system thermally stabilizes, such that the measured temperature (T₀)_(meas) is within a tolerance ε of the set-point value (T₀)_(set).

Line (618) of operational calibration algorithm 600, analogous to line (418) of calibration 400, performs a first reading process to obtain the time-averaged power P ₀ consumed by the computers 502 for the current value of the set-point coolant temperature (T₀)_(set). In this first reading process, samples of the time-varying function P₀(t) are measured by the power-measurement device 546, reported to the processing device 518, and the time-averaged power P ₀ is computed according to equation (18).

Line (620) of operational algorithm 600, analogous to line (420) of calibration algorithm 400, performs a second reading process to obtain the time-averaged power P _(Cool) consumed by the cooling equipment for the current value of coolant temperature T₀. In this second reading process, samples of the time-varying function P_(Cool)(t) are measured by the power measurement device 548, reported to the processing device 518, and the time-averaged power P _(Cool) is computed according to equation (20). Time averaging is expected to be less important for P_(Cool)(t) than for P₀(t) because P_(Cool)(t) should be slowly varying. Nevertheless, the averaging process (20) makes the measurement more robust.

Line (622) of operational algorithm 600, analogous to line (422) of calibration algorithm 400, sums the time-average computer power consumption P ₀ and the time-averaged cooling-equipment power consumption P _(Cool), thereby computing the total time-averaged power consumption P _(Total) of system 500 for the current value of (T₀)_(set).

Line (624) of operational algorithm 600, analogous to line (424) of calibration algorithm 400, tests whether the power consumption P _(Total) found for the current value of set-point coolant temperature (T₀)_(set) is less than the current value of P*. If so, line (626) assigns the current time-averaged power consumption P _(Total) as the new minimum P*, and line (628) assigns the current value of (T₀)_(set) as the new optimum temperature T*₀. Consequently, on the first iteration of the loop on (T₀)_(set), line (626) is guaranteed to be executed because of the initialization P*=9E99 at line (612). On subsequent iterations of this loop, line (626) is executed if an only if P _(Total) found for the current value of (T₀)_(set) is less than any value found for all previously tested values of (T₀)_(set). Thus, when the loop on (T₀)_(set) completes, T*₀ holds the value of coolant temperature T₀ that minimizes total power P_(Total) for the current computational task C_(j) and the current value of the weather-dependent temperature T₃.

Line (632) of calibration algorithm 600 closes the loop on (T₀)_(set). At this point in the algorithm, T*₀ holds the optimal value of (T₀)_(set); that is, the value found to produce minimum total power P _(Total).

Line (634) of calibration algorithm 600 assigns the optimal value T*₀ as the set-point value (T₀)_(Set) to be used during subsequent operation. Because coolant temperature T₀ entering computers 502 is controlled to (T₀)_(Set), it will thus be controlled to the optimal value T*₀.

The optimal coolant temperature T*₀ will be valid as long as weather conditions and the computational conditions of the computers 502 remains similar to those that existed during execution of algorithm 600. Recognizing that these conditions, particularly computational conditions, may frequently change, lines (610) through (634) of the operational algorithm 600 must be re-executed from time to time, to check whether the current set-point temperature (T₀)_(Set) is still optimal. Two strategies may be pursued:

(a) Repetition of lines (610) through (634) of algorithm 600 may be done periodically, at regular intervals. This option is illustrated conceptually by the pseudo-code shown in FIG. 7 (a). Line (606) begins a loop that continues forever. On each iteration of this loop, lines (610) through (634) are executed, followed by line (636), which causes a delay of Δt. Consequently, once a value of optimal coolant temperature T*₀ is found, it is retained for the period of time required for iteration of the “while” loop in FIG. 7 (a), after which a new search for an optimal value of T*₀ is initiated by re-executing algorithm 600. The penalty for making Δt too short is that each traverse of algorithm 600 causes periods of operation at non-optimal coolant temperatures, because the loop comprising lines (614) through (632) tests a variety of set-point values (T₀)_(Set), most of which are not optimal. The penalty for making Δt too long is that, during this period, computational conditions may change, thus shifting the optimal value of T₀ away from the previously found value T*₀. This shift will not be detected until the next iteration of algorithm 600, so the longer Δt is, the longer non-optimal conditions will persist. Thus, choosing Δt is a compromise between these two penalties; the best compromise will be system dependent. (b) Repetition of lines (610) through (634) of algorithm 600 may be triggered by a change in ongoing measurements of P ₀ and P _(Cool). This option is illustrated by pseudo-code shown in FIG. 7 (b). Line (608) begins a loop that continues forever. On each iteration of this loop, algorithm 600 is executed. Then lines (638) and (640) save the values of power P ₀ and P _(Cool), found in algorithm 600, into variables P ₀ _(—) _(OLD) and P _(Cool) _(—) _(OLD) respectively. Finally, line (642) repeatedly measures P ₀ and P _(Cool), as indicated by the “Read( )” commands, and compares these values with P ₀ _(—) _(OLD) and P _(Cool) _(—) _(OLD) respectively. If the absolute value of either difference is too large, execution of line (642) terminates, and a new iteration of the while loop at (608) is triggered, thereby causing a new search for an updated value of T*₀. The presumption here is that a change in either the measured computer power consumption P ₀ or in the measured cooling-equipment power consumption P _(Cool) may indicate a shift of the optimal value of T₀ away from the previously found value T*₀, thus requiring a search for the new optimum. For example, a large increase in P ₀ may indicate the change from the idle state of computers 504, where temperature-dependent leakage power may be a larger fraction of P ₀ than in a more-active computational state where temperature-dependent leakage is less important, thus requiring a reassessment of the optimal value of T₀.

Mathematical Model

Many features of the two embodiments discussed in the foregoing are illuminated by a mathematical model that captures the essence of the optimization involved in choosing a value of coolant temperature T₀ for computers such as 302 and 502 in FIGS. 3 and 5 respectively. Moreover, the mathematical model produces a surprising result that may be used to simplify considerably the practical implementation of the first embodiment's calibration algorithm 400 and the second embodiment's operational algorithm 600, as will be described later in additional embodiments. The following analysis applies to both systems 300 and 500; system 500 will be used as a reference.

1. Computer Power P₀

Referring to FIG. 8, consider a system such as 500 in which the computers 502 comprises an arbitrary number N of heat-producing assemblies 800. Each heat-producing assembly 800 comprises a heat-generating CMOS package 802, which is typically affixed by solder balls 804 to a circuit board 806. The CMOS package 802 is cooled by a cold head 808 that is separated therefrom by a layer of compliant, thermal-interface material 810. The cold head 808 comprises at least one coolant channel 812 wherein flows the coolant 506, which enters the cold head at temperature T₀. Each CMOS package contains a large number of heat-producing CMOS transistors whose junctions are all assumed to be at junction temperature T. Each transistor dissipates some power that is independent of temperature, such as active power and gate leakage power, as well as sub-threshold leakage power that is exponentially dependent on T, as described in Leakage in Nanometer CMOS Technologies, previously cited. These two components of dissipated power, aggregated to the level of the entire set of computers 502 of system 500, together dissipate the total power P₀, which may therefore be represented by the sum of two terms, one that is independent of junction temperature T_(j), and the other that is exponentially dependent on T, whence

P ₀ =P _(ref){(1−λ)+λexp[κ(T−T _(ref))]},  (24)

or, equivalently,

$\begin{matrix} {\frac{P_{0}}{P_{ref}} = {1 + {\lambda {\left\{ {{\exp \left\lbrack {\kappa \left( {T - T_{ref}} \right)} \right\rbrack} - 1} \right\}.}}}} & (25) \end{matrix}$

In equations (24) and (25), for a given computational state, P_(ref), λ, and κ are positive constants, T is a representative junction temperature, and T_(ref) is an arbitrary reference temperature at which P₀=P_(ref) when T=T_(ref); that is, P₀ would equal P_(ref) if the junction temperature T of all transistors in system 500 were held at the reference temperature T_(ref). For example,

T _(ref)=(T ₀)_(min)≡16° C.  (26)

is a useful choice, because then P₀=P_(ref) when T=(T₀)_(min)≡16° C., where the significance of (T₀)_(min)≡16° C. is discussed above in connection with equation (15).

In equation (24), λ is the parameter that controls the fraction of computer power that is exponentially dependent on temperature. This fraction is λ at T=T_(ref), but the fraction is greater than λ when T>T_(ref) due to the exponential growth. Specifically, for CMOS-based electronics, the second term in equation (24), P_(ref)λexp[κ(T−T_(ref))], represents the power consumed by CMOS-transistor sub-threshold leakage, whereas the first term in equation (24), P_(ref) (1−λ), represents the sum of all types of power not dependent on temperature, including CMOS-transistor switching power as well as CMOS-transistor leakage components caused by tunneling currents, such as gate leakage.

Consequently, the fraction λ is a function of computational state. For example, λ is relatively large when the transistor is off, because most transistor power in the off state is sub-threshold leakage power. However, λ becomes considerably smaller when the transistor is active, because active power and gate leakage then dominate.

Equation (25) may be written in the mathematically equivalent form

$\begin{matrix} {{\frac{P_{0}}{P_{ref}} = {1 + {\lambda \left\{ {{{\exp \left\lbrack {\kappa \left( {T - T_{0}} \right)} \right\rbrack}{\exp \left\lbrack {\kappa \left( {T - T_{ref}} \right)} \right\rbrack}} - 1} \right\}}}},} & (27) \end{matrix}$

in which the temperature difference T−T_(ref) is decomposed into (T−T₀)+(T₀−T_(ref)).

The temperature decomposition in equation (27) is useful because the temperature difference T−T₀ may be written as the product of two factors: first, the power

$\frac{P_{0}}{N}$

of one of the CMOS packages 802, and second, the thermal resistance

from the coolant flowing in cold head 808 to a typical transistor on the CMOS package 802. That is,

$\begin{matrix} {\mspace{79mu} {{{T - T_{0}} = {\left( \frac{P_{0}}{N} \right)}},\mspace{79mu} {where}}} & (28) \\ {\equiv {{Thermal}\mspace{14mu} {resistance}\mspace{14mu} {in}\mspace{14mu} {cooled}\mspace{14mu} {assembly}\mspace{14mu} 800\mspace{14mu} {from}\mspace{14mu} {coolant}\mspace{14mu} {to}\mspace{14mu} {transistor}\mspace{14mu} {{junction}.}}} & (29) \end{matrix}$

Thermal resistance

is a function only of the geometry and the thermal conductivities of the components comprising the thermal path from coolant to transistor junction. Substituting equation (28) into equation (27) yields

$\begin{matrix} {\frac{P_{0}}{P_{ref}} = {1 + {\lambda {\left\{ {{{\exp \left\lbrack {{\kappa \left( \frac{\; P_{ref}}{N} \right)}\left( \frac{P_{0}}{P_{ref}} \right)} \right\rbrack}{\exp \left\lbrack {\kappa \left( {T_{0} - T_{ref}} \right)} \right\rbrack}} - 1} \right\}.}}}} & (30) \end{matrix}$

Equation (30) is a nonlinear algebraic equation in the unknown ratio

$\frac{P_{0}}{P_{ref}}$

an equation that may readily be solved iteratively by starting with the initial guess

$\frac{P_{0}}{P_{ref}} = 1$

on the right-hand side of equation (30), thereby to compute an improved estimate of

$\frac{P_{0}}{P_{ref}}$

on the left-hand side of equation (30), this improved estimate to be inserted into the right-hand side of equation (30) to compute an even better estimate, and so on, until two successive estimates differ by a negligible amount.

In many cases, the set of N heat-producing devices is arranged in series along the coolant flow path. In such cases, the inlet coolant temperature T₀ is truly relevant only for the first heat-producing device in the series; devices further downstream see warmer coolant because the coolant has absorbed heat from upstream devices. Specifically, coolant at the downstream-most device has been heated by N−1 previous devices, so the average device sees warming from ½(N−1) previous devices. Consequently, coolant temperature for the average heat-producing device, denoted T ₀, is obtained by energy balance as

$\begin{matrix} {{\overset{\_}{T}}_{0} = {{T_{0} + \frac{\left( {N - 1} \right)\left( \frac{P_{0}}{N} \right)}{2\; \rho \; {cV}}} = {T_{0} + {\frac{N - 1}{2\; N}{\frac{P_{0}}{\rho \; {cV}}.}}}}} & (31) \end{matrix}$

where ρ=coolant density (1000 kg/m³ for water), c=coolant specific heat (4180 J/kg-C for water), and V=coolant flow rate. In analogy to equation (28), let

be the thermal resistance of the path from coolant to transistor junction as computed using the averaged coolant temperature T ₀ rather than the inlet coolant temperature T₀:

T - T _ 0 = ( P 0 N )  _ . ( 32 )

Averaged versions of coolant temperature and thermal resistance may then be used in the following modified form of equation (30):

P 0 P ref = 1 + λ  { exp  [ κ  ( _  P ref N )  ( P 0 P ref ) ]  exp  [ κ  ( T _ 0 - T ref ) ] - 1 } . ( 33 )

Substituting (31) into (33) finally yields

P 0 P ref  1 + λ  { exp  [ κ  (  _  P ref N )  ( P 0 P ref ) ]  exp  [ κ  ( T 0 + N - 1 2   N  P ref ρ   cV  ( P 0 P ref ) - T ref ) ] - 1 } . ( 34 )

Like equation (30), equation (34) is a nonlinear algebraic equation in the unknown ratio

$\frac{P_{0}}{P_{ref}},$

an equation that may readily be solved by iteration as explained above in connection with equation (30). Equation (34) is used instead of equation (30) whenever heat-producing devices are arranged in series, as in the experimental example considered below.

2. Cooling Power P_(Cool)

For simplicity, consider only the largest component of P_(Cool); namely, the power consumed by compressor 514. Then, according to FIG. 5,

P _(Cool)≈(1−β)α(P ₀ +P ₁),  (35)

where β is the fraction of power P₀+P₁ rejected by coolant 506 to free-cooling water 532 via the “free-cooling” heat exchanger 530. As will be shown, β is a function of T₀.

To simplify further, assume P₁□P₀; that is, assume that the computers 502 consume much more power than the pump 508 that circulates the coolant 506. This is a very reasonable assumption for most systems of the type represented by 500. Thus

P _(cool)≈(1−β)αP ₀.  (36)

3. Optimum Coolant Temperature T*₀

The range of coolant-inlet temperature T₀ over which P_(Total) must be considered is typically limited to

(T ₀)_(min) ≦T ₀≦(T ₀)_(max-allowed).  (37)

The lower limit (T₀)_(min)≡16° C. is imposed by condensation issues, as discussed previously in connection with equation (15). The upper limit (T₀)_(max-allowed) is imposed by manufacturers of computers 502 due to the cooling requirements thereof, because junction temperatures in the electronics 502 rise at least as much as T₀ rises—in fact slightly more due to the additional leakage power caused by higher temperature—and thus the electronics may be damaged if T₀ is too high.

In general, in the range given by equation (37), for fixed values of all other parameters, there will be an optimum value of T₀, denoted T*₀, where P_(Total)(T₀) is minimized. This optimum is often at one of the two limits (T₀)_(min) or (T₀)_(max-allowed). For example, if P_(Total)(T₀) is a monotonically increasing function in the range given by equation (37), then T*₀=(T₀)_(min). Conversely, if P_(Total)(T₀) is a monotonically decreasing function in the range given by equation (37), then T*₀=(T₀)_(max-allowed). If P_(Total)(T₀) first increases then decreases in the range given by equation (37), then T*₀ is either (T₀)_(min) or (T₀)_(max-allowed). Only if P_(Total)(T₀) first decreases then increases in the range given by equation (37) will there be a local minimum within the range, but even then, this local minimum is not necessarily the global minimum of total power P_(Total)(T₀), as illustrated by example below.

4. Dependence of β on Coolant Temperature T₀

To find the optimum coolant temperature T*₀, it is necessary to derive the functional dependence of β on T₀ for system 500 under the approximations expressed by equation (36). Let

ρ≡Density of coolant 506 [kg/m³]

c≡Heat capacity of coolant 506 [J/kg-° C.]

V≡Volumetric flow rate of coolant 506 [m³/s]

UA≡Figure of merit for the free-cooling heat exchanger 530;  (38)

-   -   the product of its overall heat-transfer coefficient U [W/m²-°         C.]     -   and its heat-transfer area A [m²].         Recall equation (11), repeated here for convenience,

$\begin{matrix} {{\beta = \frac{T_{1} - T_{2}}{T_{1} - T_{0}}},} & (39) \end{matrix}$

which says that the fraction β of the heat load P₀+P₁ removed by free-cooling heat exchanger 530 is equal to the ratio of the temperature drop across the free-cooling heat exchanger 530, T₁−T₂, to the total temperature drop T₁−T₀ across the free-cooling heat exchanger 530 and the chiller 510 combined. In equation (39), the unknown temperature T₁ may be expressed in terms of other variables by writing the energy balance across the computers 502,

P ₀ ≈ρcV(T ₁ −T ₀),  (40)

and the unknown temperature T₂ may be expressed likewise by writing the free-cooling heat-exchanger's heat-transfer equation,

βP ₀ ≈UA(T ₂ −T ₃)  (41)

where T₃ is the sum of wet-bulb temperature T_(WB) and cooling-tower approach temperature ΔT_(CT) as specified by equation (12). In both equations (40) and (41), as previously, the small pump power P₁ is neglected in comparison with the large computer power P₀, as indicated by the approximately-equal signs. Solving for T₁ in equation (40) and T₂ in equation (41) yields

$\begin{matrix} {{T_{1} = {T_{0} + \frac{P_{0}}{\rho \; {cV}}}};} & (42) \\ {T_{2} = {T_{3} + {\frac{\beta \; P_{0}}{UA}.}}} & (43) \end{matrix}$

Substituting equations (42) and (43) into equation (39) yields

$\begin{matrix} {\beta = \frac{\left( {T_{0} + \frac{P_{0}}{\rho \; {cV}}} \right) - \left( {T_{3} + \frac{\beta \; P_{0}}{UA}} \right)}{\frac{P_{0}}{\rho \; {cV}}}} & (44) \end{matrix}$

Solving for β in equation (44) yields

$\begin{matrix} {{\beta = \frac{1 + {\frac{\rho \; {cV}}{P_{0}}\left( {T_{0} - T_{3}} \right)}}{1 + \frac{\rho \; {cV}}{UA}}},} & (45) \end{matrix}$

or, equivalently,

$\begin{matrix} {\beta \equiv {\frac{1 + {\left( \frac{\rho \; {cV}}{P_{ref}} \right)\left( \frac{P_{0}}{P_{ref}} \right)^{- 1}\left( {T_{0} - T_{3}} \right)}}{1 + \frac{\rho \; {cV}}{UA}}.}} & (46) \end{matrix}$

Two exceptional cases occur. First, some combinations of parameters in equation (46) yield β<0. Physically this means that the free-cooling heat exchanger 530 can accomplish no cooling of the incoming hot-side coolant 506 at temperature T₁, because the incoming cold-side temperature T₃=T_(WB)+ΔT_(CT) is larger than T₁. In fact, β<0 implies that the free-cooling heat exchanger would warm the coolant, which would never be allowed; in such a situation, the free-cooling heat exchanger would instead be bypassed, producing β=0.

Second, other combinations of parameters in equation (46) yield β>1.

Physically this means that the free-cooling heat exchanger 530 is more than adequate to reduce the incoming hot-side coolant 506 at temperature T₁ to the required temperature T₀. In fact, to achieve the required coolant temperature T₀ with β>1 would involve reheating the coolant after it exits the free cooling heat exchanger. This would be a waste of energy without reason, and would never be implemented. Instead, the flow of free-cooling water 532 would be modulated to limit the cooling achieved by the free-cooling heat exchanger, as may be necessary to prevent the coolant temperature T₀ from becoming less than (T₀)_(min)≡16° C. Thus, the actual value of β in such cases is β=1. Because of these exceptions, equation (46) must be re-written as follows:

$\begin{matrix} {\beta = \left\{ {\begin{matrix} 0 & {if} & {\hat{\beta} < 0} & \left( {{Regime}\mspace{14mu} I\text{:}\mspace{14mu} {No}\mspace{14mu} {Free}\mspace{14mu} {Cooling}} \right) \\ \hat{\beta} & {if} & {0 \leq \hat{\beta} \leq 1} & \left( {{Regime}\mspace{14mu} {II}\text{:}\mspace{14mu} {Partial}\mspace{14mu} {Free}\mspace{14mu} {Cooling}} \right) \\ 1 & {if} & {\hat{\beta} > 1} & \left( {{Regime}\mspace{14mu} {III}\text{:}\mspace{14mu} 100\% \mspace{14mu} {Free}\mspace{14mu} {Cooling}} \right) \end{matrix}{where}} \right.} & (47) \\ {\hat{\beta} \equiv {\frac{1 + {\left( \frac{\rho \; {cV}}{P_{ref}} \right)\left( \frac{P_{0}}{P_{ref}} \right)^{- 1}\left( {T_{0} - T_{3}} \right)}}{1 + \frac{\rho \; {cV}}{UA}}.}} & (48) \end{matrix}$

5. Total Power P_(Total)

The total power P_(Total) of system 500 is found, with the help of equation (36), to be

P _(Total) ≡P ₀ +P _(Cool)=[1+(1−β)α]P ₀,  (49)

Normalizing by the power P_(ref), which is the value of power P₀ that would occur if all chips were cooled to temperature T_(ref), gives

$\begin{matrix} {\frac{P_{Total}}{P_{ref}} = {\left\lbrack {1 + {\left( {1 - \beta} \right)\alpha}} \right\rbrack {\frac{P_{0}}{P_{ref}}.}}} & (50) \end{matrix}$

The computation occurs in three steps. First,

$\frac{P_{0}}{P_{ref}}$

is found by solving the nonlinear algebraic equation (34) for the given set of the parameters λ, κ,

, P_(ref), T₀, N,

$\frac{P_{ref}}{\rho \; {cV}},$

and T_(ref). Second, β is computed from equations (47) and (48) for the given set of the parameters

$\frac{\rho \; {cV}}{P_{ref}},$

T₀, T₃ and

$\frac{\rho \; {cV}}{UA}.$

Third, the two results are combined, in equation (50), to produce

$\frac{P_{total}}{P_{ref}}.$

The impetus for this invention is that, as coolant-inlet temperature T₀ is increased, the two factors on the right-hand side of equation (50) oppose each other, suggesting that an optimum value of T₀ exists. Specifically, equations (47) and (48) show that, as the coolant temperature T₀ supplied to computers 502 is increased, the fraction β of the heat load P₀ that can be rejected to the “free cooling” heat exchanger 530 either remains the same (when β=0 or β=1) or increases (when 0<β<1), and consequently the fraction 1−β that must be rejected to the chiller 510 either remains the same or decreases, thereby reducing the power consumption of compressor 514. This power saving is precisely why prior-art schemes focus on increasing T₀. However, these prior-art schemes ignore the fact that the power consumption P₀ of the computers 502 increases as T₀ increases, such that the total power of system 500, given by equation (50), may actually rise as T₀ increases. That is, depending on the various parameters, as T₀ increases, the exponential growth of P₀ in equation (34) may overwhelm the diminution of [1+(1−β)α] equation (50). Consequently, increasing T₀ may actually increase overall power consumption, despite “free-cooling”.

6. Experimental Determination of Computer-Cooling Parameters

Several of the parameters in equation (34) may be determined experimentally for a particular set of computers 502. To illustrate this process, experiments have been conducted on an ensemble of 32 compute chips used in a water-cooled supercomputer called Blue Gene/Q, which is currently being developed by International Business Machines. Thus, in this experiment, the computers 502 are represented by the 32 Blue Gene/Q compute chips, which are arranged in series along the coolant flow path.

Consequently, equation (34) with N=32 is used to obtain

$\frac{P_{0}}{P_{ref}}$

as a function of the inlet water temperature T₀. Power P₀ is the aggregated power of the 32 chips. In this experiment, it is important to understand which parameters in equation (34) depend on the computer hardware only, and which depend both on hardware and on the computational state implied by the software being executed on the hardware.

First, the exponential parameter κ is a function of CMOS technology, so it is fixed for given hardware. In particular, parameter κ is smaller for more advanced generations of CMOS than it is for older generations, as shown in FIG. 9, which is derived from FIG. 1-9 on page 11 of Leakage in CMOS Nanometer Technologies, previously cited. The value of κ appropriate for the Blue Gene/Q chips is derived below from experimental measurements, and found to be roughly in agreement with the values plotted on FIG. 9.

Second, parameter

, the average thermal resistance of the path from a transistor junction to the coolant, is also determined by hardware; namely, by the geometries and thermal conductivities of materials in the thermal path. The value of

appropriate for the Blue Gene/Q system is derived below from experimental measurements.

Third, reference temperature T_(ref) is chosen arbitrarily, so it is the same for all hardware and all computational states. T_(ref)=16° C. is chosen herein, for reasons explained in connection with equation (26).

Fourth, parameter is a function of computational state as well as hardware, because, referring to equation (24), different computational states exhibit different fractions of power consumed by temperature-dependent leakage at T=T_(ref). Values of λ appropriate for various computational states of the Blue Gene/Q system are derived below from experimental measurements.

Fifth, reference power P_(ref) is a function of both computational state and hardware, since each combination thereof leads to a different amount of dissipated power P_(ref) at the reference temperature T_(ref). Values of P_(ref) appropriate for various computational states of the Blue Gene/Q compute chips are derived below from experimental measurements.

Referring to FIG. 10, a first experiment on the 32-chip Blue Gene/Q system is conducted in which the chips are powered on but un-initialized, hereafter called the “Un-Initialized” computational state. In this state, the chips are allowed to run for a small number of minutes with zero coolant flow rate, causing the chips to progress slowly through a large range of temperature. During this temperature rise, the ensemble average temperature T of the 32 chips, as well as the overall consumed power P₀, are periodically measured. The result, P₀ vs. T−T_(ref), is shown in FIG. 10 as the set of gray data points 1002, on the curve marked C₁=0. The significance of C₁ will be described presently.

To fit data 1002 to the mathematical model herein, equation (24) may be written as

$\begin{matrix} {{P_{0} = {C_{1} + {C_{2}{\exp \left\lbrack {\kappa \left( {T - T_{ref}} \right)} \right\rbrack}}}}{where}} & (51) \\ {{{C_{1} \equiv {P_{ref}\left( {1 - \lambda} \right)}};{C_{2} \equiv {P_{ref}\lambda}}},{whence}} & (52) \\ {{P_{ref} = {C_{1} + C_{2}}};{\lambda = {\frac{C_{2}}{C_{1} + C_{2}}.}}} & (53) \end{matrix}$

Rather than performing a three-parameter nonlinear regression to choose values of C₁, C₂ and κ in equation (51) that best fit dataset 1002, it is simpler to note that equation (51) may be written as

y=C ₂exp(κx),  (54)

where

x≡T−T _(ref) ; y≡P ₀ −C ₁.  (55)

Taking the natural log of both sides of equation (54) yields a two-parameter linear regression for the unknown regression constants C₂ and κ. A series of such linear regressions may be easily performed for various assumed values of C₁. For each regression, the best-fit values of C₂ and κ, as well as the correlation coefficient R², are recorded. The value of C₁ producing the highest correlation R² is then assumed to provide the best fit to the three-parameter model (51). Specifically, referring again to FIG. 10, regression with the value C₁=0 produces C₂=221.8 W, κ=0.01384, and a correlation coefficient R²=0.99856, corresponding to the curve shown in FIG. 10 as the solid black line that overlays dataset 1002. However, the best result is obtained with the value C₁=75 W, which produces C₂=151.3 W, κ=0.01789, and a correlation coefficient R²=0.99924, corresponding to the curve shown in FIG. 10 as the dashed black line that overlays dataset 1004, which is shift downward by C₁=75 W in accordance with the transformation y=P₀−C₁ in equation (55). Consequently, the following values are adopted for the Un-Initialized state:

C₁=75.0 W; C₂=151.3 W; κ=0.01789.  (56)

Substituting equations (56) into the first of equations (53) yields the reference power

P_(ref)=226.3 W,  (57)

which is the value of P₀ that would occur if all chips were cooled to the reference temperature T_(ref). Substituting equations (56) into the second of equations (53) yields

λ=0.6686.  (58)

That is, referring to equation (24), in the Un-Initialized state, 66.9% of the power at T=T_(ref) is temperature-dependent, sub-threshold leakage power. The remaining 33.1% is gate-leakage power, because there is no active power in this state.

In equations (56), the experimentally deduced value of the important parameter κ, namely,

κ=0.01789,  (59)

is similar to the textbook value shown on FIG. 9. The CMOS transistor gate length for the Blue Gene/Q chips is 45 nm, for which FIG. 9 would predict κ=κ≈0.016. This rough agreement with equation (59) validates the methods above.

Let

f(T)≡Fraction of power that is sub-threshold leakage power, as a function of junction temperature T.  (60)

According to the two terms in equation (24),

$\begin{matrix} {{f(T)} \equiv {\frac{\lambda \; {\exp\left\lbrack {\kappa \left( {T - T_{ref}} \right)} \right.}}{\left( {1 - \lambda} \right) + {\lambda \; {\exp\left\lbrack {\kappa \left( {T - T_{ref}} \right)} \right.}}}.}} & (61) \end{matrix}$

For example, for the Un-Initialized computational state for which equation (58) applies,

f(30° C.)=0.722; f(40° C.)=0.756.  (62)

That is, at a junction temperature of 30° C., sub-threshold leakage in the Un-Initialized state consumes 72.2% of total power; at 40° C., it consumes 75.6%.

Referring to FIG. 11, the experiment described above for the Un-Initialized state is represented by Cases A and B, where points A and B respectively are extracted from FIG. 10 for comparison to other computational states. Values for λ, P_(ref), and κ, derived above in equations (57) through (59), are also noted, with the reminder that K applies to all computational states.

Two additional experiments with the system of 32 Blue Gene/Q chips, summarized in FIG. 11 as Cases C through F, represent computational states referred to as “Idle” (Cases C and D) and “Full Power” (Cases E and F). Unlike Cases A and B, which were obtained with zero coolant flow, Cases C through F were obtained with coolant flowing at the rate

V=3.3 liter/min=5.48E−5 m³/s,  (63)

so that a steady thermal state is reached. The coolant is water, so

$\begin{matrix} {{\rho = {1000\left\lbrack \frac{kg}{m^{3}} \right\rbrack}};{c = {{4180\left\lbrack \frac{J}{{kg} - {{^\circ}\mspace{14mu} {C.}}} \right\rbrack}.}}} & (64) \end{matrix}$

Cases C and E were run at a low value of inlet coolant temperature T₀, whereas Cases D and F were run at a somewhat higher value of coolant temperature T₀. As for Cases A and B, each value of junction temperature T tabulated for Cases C through F represents a measured, ensemble average of 32 junction temperatures reported by the 32 chips used in the experiment. Each value of 1 represents the measured, aggregate power consumed by the 32 chips. The ensemble-average coolant temperature T₀ is calculated for each case according to equation (31), using N=32 and the values in equations (63) and (64).

Thermal impedance

of the path from coolant to transistor junction, as depicted in FIG. 8, may be determined from any one of the Cases C through F. For example, substituting the values for Case F from FIG. 11 into equation (32) yields

$\begin{matrix} {{= {\frac{T - {\overset{\_}{T}}_{0}}{\frac{P_{0}}{N}} = {\frac{67.1 - 28.4}{\frac{1610.9}{32}} = {0.769\left\lbrack \frac{{^\circ}\mspace{14mu} {C.}}{W} \right\rbrack}}}};} & (65) \end{matrix}$

In principle, this result applies to all Cases, because it is a thermal characterization of the physical path from coolant to junction, which, does not change with computational state. In reality, somewhat different results are obtained for other cases:

=0.700 [° C./W] for Case C;

=0.722 [° C./W] for Case D;

=0.791 [° C./W] for Case E. The result for Case F, given in equation (65), is chosen for subsequent computations because the temperature difference in the numerator of (65) is the largest of all the Cases, and thus is believed to be the most accurate.

Consider now the “Idle” computational state represented by Cases C and D in FIG. 11. Denoting Cases C and D by subscripts “C” and “D” respectively, and applying equation (25) to each Case yields

$\begin{matrix} {{{Case}\mspace{14mu} C\text{:}\mspace{14mu} \frac{P_{0\; C}}{P_{ref}}} = {1 + {\lambda \left\{ {{\exp \left\lbrack {\kappa \left( {T_{C} - T_{ref}} \right)} \right\rbrack} - 1} \right\}}}} & (66) \\ {{{Case}\mspace{14mu} D\text{:}\mspace{14mu} \frac{P_{0\; D}}{P_{ref}}} = {1 + {\lambda {\left\{ {{\exp \left\lbrack {\kappa \left( {T_{D} - T_{ref}} \right)} \right\rbrack} - 1} \right\}.}}}} & (67) \end{matrix}$

Equations (66) and (67) represent two equations in the two unknown parameters P_(ref) and λ. Dividing equation (67) by equation (66) to eliminate P_(ref) yields

$\begin{matrix} {\frac{P_{0\; D}}{P_{0\; C}} = \frac{1 + {\lambda \left\{ {{\exp \left\lbrack {\kappa \left( {T_{D} - T_{ref}} \right)} \right\rbrack} - 1} \right\}}}{1 + {\lambda \left\{ {{\exp \left\lbrack {\kappa \left( {T_{C} - T_{ref}} \right)} \right\rbrack} - 1} \right\}}}} & (68) \end{matrix}$

Clearing fractions and solving for λ yields

$\begin{matrix} {\lambda = {\frac{P_{0\; D} - P_{0\; C}}{\begin{matrix} {{P_{0\; C}\left\{ {{\exp \left\lbrack {\kappa \left( {T_{D} - T_{ref}} \right)} \right\rbrack} - 1} \right\}} -} \\ {P_{0\; D}\left\{ {{\exp \left\lbrack {\kappa \left( {T_{C} - T_{ref}} \right)} \right\rbrack} - 1} \right\}} \end{matrix}}.}} & (69) \end{matrix}$

From FIG. 11, for Cases C and D, the measured values are

P_(0C)=935.7 W; P_(0D)=978.2 W; T_(C)=38.1° C.; T_(D)=48.5° C.  (70)

In light of equations (26) and (59), substituting the values from equations (70) into equation (69) yields

λ=0.1613.  (71)

This means that, in the “Idle” computational state, if the transistor junctions of the Blue Gene/Q chips are all held at the reference temperature T_(ref)=16° C., 16.1% of the power P₀ is temperature-dependent, sub-threshold leakage power. At a junction temperature T higher than T_(ref), the fraction f of power consumed by sub-threshold leakage is higher. For example, applying equation (61) for the junction temperatures T_(C) and T_(D) measured experimentally in the “Idle” state, the power fraction f is

$\begin{matrix} {{f_{C} \equiv \frac{\lambda \; {\exp\left\lbrack {\kappa \left( {T_{C} - T_{ref}} \right)} \right.}}{\left( {1 - \lambda} \right) + {\lambda \; {\exp\left\lbrack {\kappa \left( {T_{C} - T_{ref}} \right)} \right.}}}} = 0.222} & (72) \\ {{f_{D} \equiv \frac{\lambda \; {\exp\left\lbrack {\kappa \left( {T_{D} - T_{ref}} \right)} \right.}}{\left( {1 - \lambda} \right) + {\lambda \; {\exp\left\lbrack {\kappa \left( {T_{D} - T_{ref}} \right)} \right.}}}} = 0.256} & (73) \end{matrix}$

That is, at the operating temperatures found for Cases C and D, sub-threshold leakage is 22.2% and 25.6% of total power, respectively.

The reference power P_(ref) for the “Idle” computational state may be found by substituting the values from equations (70) and (71) into equation (66):

$\begin{matrix} \begin{matrix} {P_{ref} = \frac{P_{0\; C}}{1 + {\lambda \left\{ {{\exp \left\lbrack {\kappa \left( {T_{C} - T_{ref}} \right)} \right\rbrack} - 1} \right\}}}} \\ {= \frac{935.7}{1 + {0.1613\left\{ {{\exp \left\lbrack {(0.01789)(22.1)} \right\rbrack} - 1} \right\}}}} \\ {= {867.8\mspace{14mu} W}} \end{matrix} & (74) \end{matrix}$

Similar substitution into equation (67) yields

$\begin{matrix} \begin{matrix} {P_{ref} = \frac{P_{0\; D}}{1 + {\lambda \left\{ {\exp \left\lbrack {\kappa \left( {T_{D} - T_{ref}} \right)} \right\rbrack} \right\}}}} \\ {= \frac{978.2}{1 + {0.1613\left\{ {{\exp \left\lbrack {(0.01789)(32.5)} \right\rbrack} - 1} \right\}}}} \\ {= {867.8\mspace{14mu} W}} \end{matrix} & (75) \end{matrix}$

As expected, the two results agree: the computed value of λ insures it.

Consider now the “Full Power” computational state represented by Cases E and F on FIG. 11. Denoting Cases E and F by subscripts “E” and “F” respectively, and proceeding as for the “Idle” state, produces by analogy to equation (69)

$\begin{matrix} {\lambda = {\frac{P_{0\; F} - P_{0\; E}}{\begin{matrix} {{P_{0\; E}\left\{ {{\exp \left\lbrack {\kappa \left( {T_{F} - T_{ref}} \right)} \right\rbrack} - 1} \right\}} -} \\ {P_{0\; F}\left\{ {{\exp \left\lbrack {\kappa \left( {T_{E} - T_{ref}} \right)} \right\rbrack} - 1} \right\}} \end{matrix}}.}} & (76) \end{matrix}$

From FIG. 9, for Cases E and F, the measured values are

P_(0E)=1558.1W; P_(0F)=1610.9W; T_(E)=56.8° C.; T_(F)=67.1° C.  (77)

In light of equations (26) and (59), substituting the values from equations (70) into equation (69) yields, for the “Full Power” computational state

λ=0.08839.  (78)

This means that, in the “Full Power” computational state, if the transistor junctions of the Blue Gene/Q chips are all held at the reference temperature T_(ref)=16° C., 8.8% of the power P₀ is temperature-dependent, sub-threshold leakage power. At a junction temperature T higher than T_(ref), the fraction f of power consumed by sub-threshold leakage will be higher. Using definition (61), for Cases E, the fraction f is

$\begin{matrix} {{f_{E} \equiv \frac{\lambda \; {\exp\left\lbrack {\kappa \left( {T_{E} - T_{ref}} \right)} \right.}}{\left( {1 - \lambda} \right) + {\lambda \; {\exp\left\lbrack {\kappa \left( {T_{E} - T_{ref}} \right)} \right.}}}} = 0.167} & (79) \end{matrix}$

Similarly, for Case F, the fraction f is

$\begin{matrix} {{f_{F} \equiv \frac{\lambda \; {\exp\left\lbrack {\kappa \left( {T_{F} - T_{ref}} \right)} \right.}}{\left( {1 - \lambda} \right) + {\lambda \; {\exp\left\lbrack {\kappa \left( {T_{f} - T_{ref}} \right)} \right.}}}} = 0.195} & (80) \end{matrix}$

That is, in the “Full Power” state, at the operating temperatures found for Cases E and F, sub-threshold leakage power is 16.7% and 19.5% of total power, respectively. The reference power P_(ref) for the “Full Power” computational state is computed as described above for the “Idle” state. For Case E:

$\begin{matrix} \begin{matrix} {P_{ref} = \frac{P_{0E}}{1 + {\lambda \left\{ {{\exp \left\lbrack {\kappa \left( {T_{E} - T_{ref}} \right)} \right\rbrack} - 1} \right\}}}} \\ {= \frac{1558.1}{1 + {0.08839\left\{ {{\exp \left\lbrack {(0.01789)(40.8)} \right\rbrack} - 1} \right\}}}} \\ {= {1422.9\mspace{14mu} W}} \end{matrix} & (81) \end{matrix}$

Similarly, for Case F:

$\begin{matrix} \begin{matrix} {P_{ref} = \frac{P_{0F}}{1 + {\lambda \left\{ {\exp \left\lbrack {\kappa \left( {T_{F} - T_{ref}} \right)} \right\rbrack} \right\}}}} \\ {= \frac{1610.9}{1 + {0.08839\left\{ {{\exp \left\lbrack {(0.01789)(51.1)} \right\rbrack} - 1} \right\}}}} \\ {= {1422.9\mspace{14mu} W}} \end{matrix} & (82) \end{matrix}$

As expected, the two results agree—the computed value of λ, insures it.

Referring to FIG. 12, substituting parameter values from FIG. 11 and equations (63) and (64) into the non-linear algebraic equation (34), and solving iteratively for

$\frac{P_{0}}{P_{ref}}$

yields the curves labeled “λ=0.6686”, “λ=0.1613”, and “λ=0.08839” on FIG. 11, which correspond to the Un-Initialized, Idle, and Full-Power computational states, respectively. Recall that, by definition, P_(ref) is the value of P₀ that would occur if the temperature T of all transistors were held at (T₀)_(min)=16° C. Consequently, it should be no surprise that

$\frac{P_{0}}{P_{ref}} > 1$

at T₀=16° C. on FIG. 12. Instead,

$\frac{P_{0}}{P_{ref}} = 1$

occurs at some value of inlet coolant temperature T₀ lower than 16° C.; specifically, at the value of T₀ corresponding to a value of chip-averaged coolant temperature T ₀ that allows transistors in the average chip to achieve 16° C. in the presence of the thermal resistance

between coolant and transistor. To simplify the interpretation of plots such as FIG. 12, it is convenient to define a reference power denoted (P₀)_(16° C.), whose value, like P_(ref) will vary with λ:

(P ₀)_(16° C.) ≡P ₀ @T ₀=16° C.  (83)

That is, (P₀)_(16° C.) is the consumed power for the ensemble of chips when the coolant inlet temperature at the first chip in the series is 16° C. For each curve on FIG. 12, the values of (P₀)_(16° C.) may be read at T₀=16° C. as a multiple of P_(ref). The value of (P₀)_(16° C.) for the three experimental computational states are tabulated in FIG. 11. Converting from reference power P_(ref) to this more-intuitive reference power (P₀)_(16° C.) defined by equation (83) produces FIG. 13, which is simply a re-normalized version of FIG. 12. From FIG. 13 it is clear, for example, that when the chips are in the “Uninitialized” computational state, corresponding to the curve λ=0.6686, the penalty for increasing inlet temperature from 16° C. to 40° C. is a 42% increase in chip power consumption.

The experiments described above show that the parameters λ, P_(ref) depend on computational state. However, P_(ref) appears to be related to λ, as shown on FIG. 14, where regression is used to obtain a best-fit power law to the three Cases tabulated in FIG. 10. Consequently, for purposes of the numerical study below, the following assumption is adopted for the 32-chip Blue Gene system:

P_(ref)=169.42λ^(−0.91957).  (84)

7. Dependence of Total Power P_(Total) on Parameters

The above results characterizing computer power P₀ may now be used to study P_(Total)=P₀+P_(Cool), where equation (50) for P_(Total) may be renormalized as

$\begin{matrix} {\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = {\left\lbrack {1 + {\left( {1 - \beta} \right)\alpha}} \right\rbrack {\frac{P_{0}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}.}}} & (85) \end{matrix}$

The computation occurs in four steps. First,

$\frac{P_{0}}{P_{ref}}$

as a function of T₀, denoted

${\frac{P_{0}}{P_{ref}}\left( T_{0} \right)},$

is found by repeatedly solving the nonlinear algebraic equation (50) for the given set of the parameters

$\begin{matrix} {\lambda,\kappa,,P_{ref},N,\frac{P_{ref}}{\rho \; {cV}},{{and}\mspace{14mu} {T_{ref}.}}} & (86) \end{matrix}$

Second, each result for

$\frac{P_{0}}{P_{ref}}\left( T_{0} \right)$

is normalized by the result at T₀=16° C., denoted

${\frac{P_{0}}{P_{ref}}\left( {16{^\circ}\mspace{14mu} {C.}} \right)},$

to produce

$\begin{matrix} {{\frac{P_{0}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}\left( T_{0} \right)} = {\frac{\frac{P_{0}}{P_{ref}}\left( T_{0} \right)}{\frac{P_{0}}{P_{ref}}\left( {16{^\circ}\mspace{14mu} {C.}} \right)}.}} & (87) \end{matrix}$

Third, the factor [1+(1−β)α] in equation (85) is computed from equations (47) and (48) for the given set of the parameters

$\begin{matrix} {\alpha,\frac{\rho \; {cV}}{P_{ref}},T_{0},T_{3},{{and}\mspace{14mu} {\frac{\rho \; {cV}}{UA}.}}} & (88) \end{matrix}$

Fourth, the results for

$\frac{P_{0}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}\left( T_{0} \right)$

and [1+(1−β)α] are combined, in equation (85), to produce

$\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}.$

Numerical values of the several parameters shall now be adopted for use in FIGS. 15-24. Values of κ,

, T_(ref), N, and ρcV are adopted from the Blue Gene/Q experiments described above:

κ=0.01789;

=0.769[° C./W]; T_(ref)=16[° C.]; N=32; ρcV=229[W/° C.].  (89)

Combining equations (84) and (89) yields the functional relationship between λ and the parameter grouping

$\frac{\rho \; {cV}}{P_{ref}}$

that appears in equations (34) and (46):

$\begin{matrix} {\frac{\rho \; {cV}}{P_{ref}} = {1.3517\mspace{11mu} {\lambda^{0.91957}.}}} & (90) \end{matrix}$

Assume that the free-cooling heat exchanger, sized to handle the small amount of power represented by the 32 Blue Gene/Q compute chips, has the figure of merit

UA=458[W/° C.],  (91)

which is tantamount to assuming the dimensionless parameter

$\begin{matrix} {\frac{\rho \; {cV}}{UA} = {0.5.}} & (92) \end{matrix}$

Assume that the chiller consumes 15% of the heat that it transfers, so

α=0.15.  (93)

As a result of the numerical assumptions above,

$\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}$

vs. T₀ may be plotted, as a function of just two free parameters, λ and T₃, because all other parameters are known or assumed from equations (89) through (93). Specifically, the numerical values of κ,

, N, and ρcV are known for the given system of 32 Blue Gene/Q compute chips, the numerical values of T_(ref),

$\frac{\rho \; {cV}}{UA},$

and α are assumed, and the numerical values of P_(ref) and

$\frac{\rho \; {cV}}{P_{ref}}$

depend only on λ, according to equations (84) and (90). The two free parameters λ and T₃ correspond to the two primary variables of the system 500; namely, the computational state, as represented by λ, and the weather, as represented by T₃, inasmuch as T₃ is related to wet-bulb temperature, according to equation (12). To illustrate the computation process described above, consider the case

λ=0.5; T₃=21° C.; T₀=20° C.  (94)

Substituting (94) into equation (84) produces

P _(ref)=169.42(0.5)^(−0.91957)=320.47 W.  (95)

Substituting values from equations (89), (94) and (95) into equation (34) yields, after several iterations,

$\begin{matrix} {{\frac{P_{0}}{P_{ref}}\left( {T_{0} = {20{^\circ}\mspace{14mu} {C.}}} \right)} = {1.127.}} & (96) \end{matrix}$

Repeating the latter procedure with T₀=16° C. produces

$\begin{matrix} {{\frac{P_{0}}{P_{ref}}\left( {T_{0} = {16{^\circ}\mspace{14mu} {C.}}} \right)} = {1.080.}} & (97) \end{matrix}$

Substituting equations (96) and (97) into equation (87) yields

$\begin{matrix} {{\frac{P_{0}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}\left( {T_{0} = {20{^\circ}\mspace{14mu} {C.}}} \right)} = {\frac{1.127}{1.080} = {1.044.}}} & (98) \end{matrix}$

Substituting equations (89), (92), (94), (95), and (96) into equation (47) and (48) yields

β={circumflex over (β)}=0.244.  (99)

Substituting (93), (98), and (99) into (85) yields

$\begin{matrix} {\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{11mu} {C.}}} = {{\left\lbrack {1 + {\left( {1 - 0.244} \right)(0.15)}} \right\rbrack (1.044)} = {1.162.}}} & (100) \end{matrix}$

If the three free parameters in (94) only λ affects (P₀)_(16° C.). As shown on FIG. 15, numerical studies of the type just described show that, given (84), (89), and (93),

(P ₀)_(16° C.)=182.83λ^(−0.92338) [W].  (101)

Equation (101) is helpful to interpret physically subsequent plots where P_(Total) is normalized by (P₀)_(16° C.). Repeating the calculations represented by (94) through (100) many times, for various values of the three parameters in (94), yields FIGS. 16 through 24, in which

$\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}$

is plotted vs. T₀, λ, and T₃. That is, FIGS. 16 through 24 show how the total system power P_(Total) (computer power plus chiller power) varies as a function of coolant inlet temperature (T₀), computational state (as represented by λ) and weather (as represented by T₃). In these plots, T₀ varies over the full practical range of inlet coolant temperature,

(T ₀)_(min) ≦T ₀≦(T ₀)_(max,)  (102)

where (T₀)_(min)≡16° C. and (T₀)_(max)≡39° C. as discussed in connection with equations (14) and (15).

In general, each curve on FIGS. 16-24 has either one, two, or three segments. From left to right, these segments are denoted as Regimes I, II, and III. The three Regimes correspond to the value of the “free-cooling” fraction β, defined by equations (47) and (48) above:

Regime I: β=0 (No free cooling)

Regime II: 0<β<1 (Mixed free cooling and chiller cooling)

Regime III: β=1 (100% free cooling).  (103)

For example, for the curve λ=0.9 on FIG. 21, Regime I prevails for T₀≦25.9° C., Regime II prevails for 25.9° C.<T₀≦27.6° C., and Regime III prevails for T₀>27.6° C. Physically speaking, Regime I prevails when the temperature T₃ of the water returned from the cooling tower 538 to the free-cooling heat exchanger 530 is so warm that free-cooling cannot provide any assistance in achieving the specified value of T₀. Consequently, Regime I tends to occur when T₃ is high and T₀ is low; for example, on the left portion of FIGS. 22 through 24. Regime II prevails when T₃ is cool enough to allow the free-cooling heat exchanger 530 to provide some but not all of the cooling required to achieve the specified value of T₀. Regime III prevails when T₃ is cool enough to allow free-cooling to provide all of the cooling required to achieve the specified value of T₀. Consequently, Regime III occurs then T₃ is low and/or the specified value of T₀ is high; for example, on the right portion of FIGS. 16-23.

On FIGS. 16-24, for each curve where boundaries between Regimes exist, let

$\begin{matrix} {{{\left( T_{0} \right)_{{Local}\mspace{14mu} {Max}} \equiv {{Boundary}\mspace{14mu} {between}\mspace{14mu} {Regimes}\mspace{14mu} I\mspace{14mu} {and}\mspace{14mu} {II}}},{{{typically}\mspace{14mu} {corresponding}\mspace{14mu} {to}\mspace{14mu} a\mspace{14mu} {local}\mspace{14mu} {maximum}\mspace{14mu} {of}\mspace{14mu} \frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}};}}\mspace{14mu}} & (104) \\ {{\left( T_{0} \right)_{{Local}\mspace{14mu} {Min}} \equiv {{Boundary}\mspace{14mu} {between}\mspace{14mu} {Regimes}\mspace{14mu} {II}\mspace{14mu} {and}\mspace{14mu} {III}}},{{typically}\mspace{14mu} {corresponding}\mspace{14mu} {to}\mspace{14mu} a\mspace{14mu} {local}\mspace{14mu} {minimum}\mspace{14mu} {of}\mspace{14mu} {\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}.}}} & (105) \end{matrix}$

For example, referring to the aforesaid curve λ=0.9 on FIG. 22,

(T ₀)_(Local Max)=25.9° C. and

(T ₀)_(Local Min)=27.6° C.

In general, P_(Total) always increases with T₀ in Regimes I and III, due to the exponential dependence of P₀ on T ₀ in equation (33). P_(Total) typically decreases with T₀ in Regime II, due to the gradual onset of free cooling, where β grows from 0 to 1, and thus the factor [1+(1−β)α] in equation (85) decreases. However, depending on parameters, particularly κ, the aforesaid exponential growth in P₀ may, in equation (85), outweigh the decrease of [1+(1−β)α]. In such a case, P_(Total) is monotonically increasing with T₀, no local minimum exists at the boundary between Regimes II and III, and (T₀)_(Local Min) is a mis-nomer. Nevertheless, for all cases shown herein, the local minimum at (T₀)_(Local Min) exists.

Two features of FIGS. 16-24 require explanation. First, on FIG. 16, Regime III prevails throughout, which means that the chiller is idle throughout, so P_(Cool)=0, whence P_(Total)=P₀. Second, for all curves on FIGS. 19-24, where Regime I prevails at T₀=16° C.,

$\begin{matrix} {\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = {{1.15\mspace{14mu}@\mspace{14mu} T_{0}} = {16{^\circ}\mspace{14mu} {C.}}}} & (106) \end{matrix}$

This is a natural consequence of assumption (93), α=0.15, because, according to equation (85), in Regime I where β=0,

$\begin{matrix} {\left. \frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} \right|_{T_{0} = {16{^\circ}\mspace{14mu} {C.}}} = {\left. {\left\lbrack {1 + {\left( {1 - \beta} \right)\alpha}} \right\rbrack \frac{P_{0}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}} \right|_{T_{0} = {16{^\circ}\mspace{14mu} {C.}}} = {{\left\lbrack {1 + {\left( {1 - 0} \right)(0.15)}} \right\rbrack (1)} = {1.15.}}}} & (107) \end{matrix}$

That is, at T₀=16° C., the fully utilized chiller consumes 15% of P₀, so total power is 1.15P₀.

It is clear that, as T₃ increases beyond a certain point, free cooling ceases to produce lower total power P_(Total) compared to conventional cooling at T₀=16° C. Specifically, FIG. 25, which is derived from FIGS. 16-24, compares the free-cooling-induced induced local minima in FIGS. 16-24, denoted (P_(Total))_(Local Min), to the value of P_(Total) at T₀=16° C., denoted (P_(Total))_(16° C.). For example, on FIG. 22 (T₃=30° C.), on the curve λ=0.9,

$\begin{matrix} {{\frac{\left( P_{Total} \right)_{{Local}\mspace{11mu} {Min}}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = {{1.302\mspace{11mu}@\; T_{0}} = {30.6{^\circ}\mspace{14mu} {C.}}}},{whereas}} & (108) \\ {{\frac{\left( P_{Total} \right)_{16{^\circ}\mspace{11mu} {C.}}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = {{1.15\mspace{11mu}@\; T_{0}} = {{16{^\circ}\mspace{14mu} C}..}}}{{Consequently},}} & (109) \\ {\frac{\left( P_{Total} \right)_{{Local}\mspace{11mu} {Min}}}{\left( P_{Total} \right)_{16{^\circ}\mspace{14mu} {C.}}} = {\frac{1.302}{1.15} = 1.132}} & (110) \end{matrix}$

The value in equation (110) appears on FIG. 25 as Point A. This example is an illustration of the fact that the local minimum of total power P_(Total), which is achieved by using 100% free cooling, is not necessarily the global minimum of total power. In this case, the free-cooling local minimum at T₀=30.6° C. actually requires 13.2% more power than the global minimum at T₀=16° C., because the higher temperature required to use free cooling incurs more additional leakage power than it saves by eliminating chiller power. This is true whenever the curves in FIG. 25 are above 1.00 on the vertical axis.

8. Optimal Coolant Temperature T*₀

Referring now to FIG. 26, an example of the principal result of this analysis is shown; namely, a map of the optimal, most power-efficient inlet coolant temperature, denoted T*₀, as a function of the computational state parameter λ and the weather-related parameter T₃. This map of T*₀ is found by computations like those plotted on FIGS. 16-24, except that a much finer gradation in T₃ is used, and the global minimum of each curve is saved. These computations yield, for each combination of λ and T₃, the value of T*₀ that produces the global minimum of

$\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}},$

whether this minimum be the local minimum defined by equation (105), or whether it be the “edge minimum” at 16° C., as discussed in the previous paragraph. For each value of λ, the optimal value T*₀ is 16° C. for low values of T₃, then rises linearly with T₃, and then suddenly drops again to T*₀=16° C. The reason for the sudden drop is explained by the interplay between the aforementioned local minimum and the edge minimum at T*₀=16° C. For example, consider the curves for λ=0.4 on FIG. 21 (T₃=27° C.) and FIG. 22 (T₃=30° C.). The local minimum of the curve on FIG. 21,

${\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = {{1.120\mspace{14mu} {at}\mspace{14mu} T_{0}} = {28.1{^\circ}\mspace{14mu} {C.}}}},$

denoted Point B, is slightly below the edge minimum

${\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = {{1.15\mspace{20mu} {at}\mspace{14mu} T_{0}} = {16{^\circ}\mspace{14mu} {C.}}}},$

whereas the local minimum on FIG. 22,

$\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = 1.155$

at T₀=31.1° C., denoted Point C, is slightly above the edge minimum. It is clear that at some “critical” value of T₃ between T₃=27° C. and T₃=30° C., the value of the local minimum will be exactly

$\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}} = 1.15$

at an abscissa value of T₀=(T₀)_(Local Min). At this “critical” value of T₃, the value of T₀ that minimizes

$\frac{P_{Total}}{\left( P_{0} \right)_{16{^\circ}\mspace{14mu} {C.}}}$

will, as a function of T₃, suddenly jump from the local minimum at T*₀=(T₀)_(Local Min) to the edge minimum at T*₀=16° C. In the example with λ=0.4, this sudden shift is reflected in FIG. 26 as the near-vertical segment of the λ=0.4 curve at T₃=29.5° C. Once the optimum coolant temperature T*₀ has “downshifted” in this manner to 16° C., it remains at 16° C. for all higher values of T₃.

9. Downshift Boundary

FIG. 27 summarizes the downshift behavior described above by showing the downshift-boundary curve 2702, where each of the diamond-shaped points on curve 2702 is a combination of (T₃, λ) in FIG. 26 where the downshift from T₀*=(T₀)_(Local Min) to T*₀=(T₀)_(Local Min) occurs. Consequently, in the region of (T₃, λ) space below the downshift-boundary curve 2702, 100% free-cooling (β=1) is optimal: to minimize total power in this region of (T₃,λ) space, the coolant temperature should be whatever the free-cooling system can provide, namely (T₀)_(Local Min), provided this coolant temperature does not imply equipment temperatures that exceed the equipment's thermal limitations. However, in the region of (T₃,λ) space above the downshift-boundary curve 2702, 100% chiller cooling (β=0) is optimal: to minimize total power in this region of (T₃,λ) space, the coolant temperature should be as low as possible consistent with the avoidance of condensation, typically 16° C.

The quantitative results shown in FIGS. 16 through 27—in particular the location of the downshift-boundary curve in FIG. 27—depend on the set of parameters given by equations (89) through (93), to which all of the FIGS. 16 through 27 apply. To understand how the results depend on parameters, consider variation of the four parameters κ,

,

$\frac{\rho \; {cV}}{UA}$

and α, as tabulated in FIG. 28, whose left-hand column refers to FIGS. 29 through 32. FIGS. 29 through 32 illustrate how the downshift boundary varies with κ,

,

$\frac{\rho \; {cV}}{UA}$

and α respectively. Numerical values written in bold-face type in FIG. 28 are the nominal values used for the earlier FIGS. 16 through 27. As each of the four parameters κ,

, and

$\frac{\rho \; {cV}}{UA}$

and α is varied in FIGS. 29-32, the other three parameters are held at these nominal values.

FIG. 29 demonstrates that the location of the downshift boundary is a strong function of κ, as expected, because κ is a factor in the exponent of both exponential functions in equation (34). Increasing κ pulls the downshift boundary to the left, because higher κ implies that computer power P₀ is more sensitive to temperature, which makes the higher coolant temperatures required by free cooling less effective at saving power. Thus, an accurate knowledge of κ for a given system is important in determining where the downshift boundary actually lies.

FIG. 30 shows the downshift boundary's dependence on thermal impedance

. This dependence is strongest at small values of λ, values which are important for many computational states, based on the values of λ tabulated in FIG. 11. Increasing

pulls the downshift boundary to the left for the same reason that increasing κ pulls it to the left: P₀ is more sensitive to temperature. Thus, an accurate knowledge of

for a given system is important in determining where the downshift boundary lies.

FIG. 31 shows the downshift boundary's dependence on the dimensionless group

$\frac{\rho \; {cV}}{UA}.$

As

$\frac{\rho \; {cV}}{UA}$

increases, the downshift boundary moves to the left. Physically, this corresponds to the fact that

$\frac{\rho \; {cV}}{UA}$

increases as UA decreases, which makes sense: UA is the figure of merit for the free-cooling heat exchanger, so lower UA implies a less effective free-cooling system, which impedes its usefulness at high T₃, thereby pushing the downshift boundary leftward.

FIG. 32 shows the downshift boundary's dependence on α. Decreasing α pulls the downshift boundary to the left. Physically, this corresponds to the fact that α is the chiller figure of merit: smaller α implies a more efficient chiller, which favors the 100% chiller solution T*₀=16° C. over the 100% free-cooling solution T*₀(T₀)_(Local Min).

FIGS. 29 through 32 show that, for temperate climates, operation of modern, CMOS-based electronics is likely to take place below the downshift boundary, where overall power consumption is minimized by using the coolant temperature (T₀)_(Local Min) that is naturally provided by 100% free cooling. However, in climates with high wet-bulb temperature, where T₃ is large, operation of CMOS-based electronics may well take place above the downshift boundary, where overall power consumption is minimized by using a coolant temperature that is as low as possible, typically 16° C., despite the availability of free cooling. Thus, especially in high-wet-bulb climates, this invention may be utilized to minimize power consumption in data centers and similar installations of electronics.

In summary, the mathematical model presented above as equations (24) through (50), as well as the numerical examples of this model presented as FIGS. 10 through 31, make it clear that, when electronics such as computers are cooled by a system involving the complementary use of a conventional chiller and a free-cooling system, the temperature dependence of the electronics' power must be taken into account to reach the best, most energy-efficient choice of coolant temperature T₀. Particularly in climates having high wet-bulb temperatures, it is wrong to believe that using the high coolant temperature required to achieve free cooling is necessarily energy efficient merely because it extends the reach of free cooling. Rather, the total consumed power under various conditions of weather and computational load must be ascertained as a function of coolant temperature T₀ to choose between 100% free cooling or 100% conventional, chiller-based cooling, these two choices being the two potential optima revealed by the downshifting phenomenon described above. However, because the mathematical model above cannot possibly capture all the complexities of a real system, physical measurement of the total power under various conditions is necessary to develop the optimal cooling strategy vs. computational state and weather conditions. The current invention provides apparatus and various methods to accomplish this goal.

Additional Embodiments Suggested by the Mathematical Model

The mathematical model above shows that the optimum, energy-minimizing coolant temperature T*₀ is either (T₀)_(min) or (T₀)_(Local Min). The former temperature, (T₀)_(min), is defined by equation (26), or more generally as slightly higher than the machine room's ambient dew-point temperature. The latter temperature, (T₀)_(Local Min), is defined by equation (105). This mathematical result greatly simplifies the calibration algorithm 400 in FIG. 4 described above in connection with the first embodiment: according to the mathematical model, it is not necessary to test a multitude of values of the set-point coolant temperature (T₀)_(set) at line (414); rather, it is only necessary to test the two values (T₀)_(min) and (T₀)_(Local Min). The value of (T₀)_(min) is given either by adherence to a blanket standard, such as the ASHRAE 2008 Class 1 guideline, T_(Dew point)=15° C., which suggests (T₀)_(min)=16° C., or by direct measurement of the ambient dew-point and application of a rule such as

(T ₀)_(min) =T _(DewPoint)+1° C.  (112)

For given weather conditions, the value of (T₀)_(Local Min) is provided by the system 300 itself: it is the temperature at which coolant 306 is returned to machine room 304 if the free-cooling heat exchanger 330 is used while the chiller 310 is turned off or bypassed.

Consequently, referring to FIG. 33, an additional embodiment of this invention is identical to apparatus 300 except that the calibration algorithm 400 in FIG. 4, executed by processing device 318, is replaced by calibration algorithm 3300, which is identical to algorithm 400 except that lines 414 through 432 are replaced by lines 3314 through 3340. That is, the “for loop” at line 414, which specifies that a multitude of values of (T₀)_(set) tested, is replaced by the “for loop” at line 3314, which specifies that only two values of (T₀)_(set) be tested, namely (T₀)_(min) and (T₀)_(Local Min).

Likewise, referring to FIG. 34, yet another embodiment of this invention is identical to apparatus 500 except that the operational algorithm 600 in FIG. 6, executed by processing device 518, is replaced by operational algorithm 3400, which is identical to algorithm 600 except that lines 614 through 632 are replaced by lines 3414 through 3440. That is, the “for loop” at line 614, which specifies that a multitude of values of (T₀)_(set) be tested, is replaced by the “for loop” at line 3414, which specifies that only two values of (T₀)_(set) be tested, namely (T₀)_(min) and (T₀)_(Local Min).

Computer Implementations

FIG. 35 illustrates an exemplary hardware configuration of a computing system 3500. The hardware configuration preferably has at least one processor or central processing unit (CPU) 3511. The CPUs 3511 are interconnected via a system bus 3512 to a random access memory (RAM) 3514, read-only memory (ROM) 3516, input/output (I/O) adapter 3518 (for connecting peripheral devices such as disk units 3521 and tape drives 3540 to the bus 3512), user interface adapter 3522 (for connecting a keyboard 3524, mouse 3526, speaker 3528, microphone 3532, and/or other user interface device to the bus 3512), a communication adapter 3534 for connecting the system 3500 to a data processing network, the Internet, an Intranet, a local area network (LAN), etc., and a display adapter 3536 for connecting the bus 3512 to a display device 3538 and/or printer 3539 (e.g., a digital printer of the like).

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a system, apparatus, or device running an instruction.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a system, apparatus, or device running an instruction.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may run entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which run via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which run on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more operable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

While the present invention has been particularly shown and described with respect to preferred embodiments thereof, it will be understood by those skilled in the art that changes in forms and details may be made without departing from the spirit and scope of the present application. It is therefore intended that the present invention not be limited to the exact forms and details described and illustrated herein, but falls within the scope of the appended claims. 

1. An apparatus for cooling electronics comprising: a. a fluid-moving means for moving fluid coolant in a closed loop, adapted to permit the coolant to enter the electronics at a temperature T₀ and leave the electronics at a temperature T₁ that is greater than T₀; b. a free-cooling means for receiving the coolant downstream of the electronics, adapted to permit the coolant to enter the free-cooling means at a temperature substantially equal to T₁ and leave the free-cooling means at a temperature T₂ that is less than or equal to T₁; c. a chilling means for receiving coolant downstream of the free-cooling means, adapted to permit the coolant to enter the chilling means at a temperature substantially equal to T₂ and leaves the chilling means at the temperature T₀ that is less than or equal to T₂, thereafter to flow to the electronics to complete the closed loop; d. a control means for modulating the operation of the chilling means to obtain a pre-determined value of the temperature T₀, e. a first power-measurement means to measure electrical power consumption P₀ of the electronics; f. a second power-measurement means to measure electrical power consumption P_(Cool) of the chilling means; and g. a computing means, in communication with the control means and with the first and second power-measurement means, for calculating an optimum coolant temperature T*₀ based on P₀ and P_(Cool).
 2. An apparatus according to claim 1, wherein the computing means executes an algorithm that, as a first step, causes the computing means to direct the control means to produce a series of values of the temperature T₀, the i^(th) element of this series being stored in an array denoted T₀[i], each value T₀[i] persisting as the fluid temperature T₀ long enough for the apparatus to come to thermal equilibrium and for the computing means to perform the following steps: i. obtain a reading P₀[i] from the first measurement means; ii. obtain a reading P_(Cool)[i] from the second measurement means; iii. compute and store the total electrical power consumption for T ₀ =T ₀ [i], namely P _(Total) [i]≡P ₀ [i]+P _(Cool) [i]; and as a second step, causes the computing means to find the value i* of index i for which P_(Total)[i*] is the minimum of the array P_(Total)[i], and to select, for subsequent operation of the apparatus, the corresponding coolant temperature T*₀≡T₀[i*], such that the apparatus will consume, in subsequent operation, the smallest possible amount of electrical power.
 3. An apparatus according to claim 2, wherein the series of values of the temperature T₀ comprises elements T₀[1] and T₀[2], where T₀[1] is a lowest-permissible coolant temperature, and T₀[2] is a coolant temperature obtained when the free-cooling means provides 100% of the cooling, with the chilling means turned off or bypassed.
 4. An apparatus according to claim 3, wherein the lowest-permissible coolant temperature T₀[1] is calculated based on an ambient dew-point temperature T_(Dew Point).
 5. An apparatus according to claim 4, wherein the lowest-permissible coolant temperature T₀[1] is calculated as T₀[1]=T_(Dew Point)+1° C.
 6. An apparatus for cooling electronics comprising: a. plumbing including a pump that moves fluid coolant in a closed loop, adapted to permit the coolant to enter the electronics at a temperature T₀ and leave the electronics at a temperature T₁ that is greater than T₀; b. a heat exchanger that receives the coolant downstream of the electronics, adapted to permit the coolant to enter the heat exchanger at a temperature substantially equal to T₁ and leaves the heat exchanger at a temperature T₂ that is less than or equal to T₁, c. a chiller that receives coolant downstream of the heat exchanger, adapted to permit the coolant to enter the chiller at a temperature substantially equal to T₂ and leave the chilling means at the temperature T₀ that is less than or equal to T₂, thereafter to flow to the electronics to complete the closed loop; d. a controller that modulates the operation of the chiller to obtain a pre-determined value of the temperature T₀, e. a first power meter that measures electrical power consumption P₀ of the electronics; f. a second power meter that measures electrical power consumption P_(Cool) of the chiller; and g. a computer processor, in communication with the controller and with the first and second power meters, programmed to calculate an optimum coolant temperature T₀* based on P₀ and P_(Cool).
 7. An apparatus according to claim 6, wherein the computer processor executes an algorithm that, as a first step, causes the computer processor to direct the controller to produce a series of values of the temperature T₀, the i^(th) element of this series being stored in an array denoted T₀[1], each value T₀[i] persisting as the temperature T₀ long enough for the apparatus to come to thermal equilibrium and for the computer processor to perform the following steps: i. obtain a reading P₀[i] from the first power meter; ii. obtain a reading P_(Cool)[i] from the second power meter; iii. compute and store the total electrical power consumption for T ₀ =T ₀ [i], namely P _(Total) [i]≡P ₀ [i]+P _(Cool) [i]; and as a second step, causes the computer processor to find the value i* of index i for which P_(Total)[i*] is the minimum of the array P_(Total)[i], and to select, for subsequent operation of the apparatus, the corresponding coolant temperature T*₀≡T₀[i*], such that the apparatus will consume, in subsequent operation, the smallest possible amount of electrical power.
 8. An apparatus according to claim 7, wherein the series of values of the temperature T₀ comprises elements T₀[1] and T₀[2], where T₀[1] is a lowest-permissible coolant temperature, and T₀[2] is a coolant temperature obtained when the free-cooling means provides 100% of the cooling, with the chiller turned off or bypassed.
 9. An apparatus according to claim 8, wherein the lowest-permissible coolant temperature T₀[1] is calculated based on an ambient dew-point temperature T_(Dew Point).
 10. An apparatus according to claim 9, wherein the lowest-permissible coolant temperature T₀[1] is calculated as T₀[1]=T_(Dew Point)+1° C.
 11. A method for cooling electronics comprising: a. providing a fluid coolant with a pre-determined temperature T₀, b. moving the coolant in a closed-loop system so that the coolant enters the electronics at the temperature T₀ and leaves the electronics at a temperature T₁ that is greater than T₀; c. receiving the coolant in a heat exchanger downstream of the electronics at a temperature substantially equal to T₁ and cooling the coolant to a temperature T₂ that is less than or equal to T₁; d. receiving the coolant in a chiller downstream of the heat exchanger at a temperature substantially equal to T₂ and cooling the coolant to the temperature T₀ that is less than or equal to T₂, thereafter moving the coolant to the electronics to complete the system's closed loop; e. measuring an electrical power consumption P₀ of the electronics; f. measuring an electrical power consumption P_(Cool) of the chiller; and g. calculating an optimum coolant temperature T*₀ based on P₀ and P_(Cool).
 12. A method according to claim 11, wherein the step of calculating comprises executing an algorithm that, as a first step, produces a series of values of the temperature T₀, the i^(th) element of this series being stored in an array denoted T₀[i], each value T₀[i] persisting as the temperature T₀ long enough for the closed-loop system to come to thermal equilibrium and for the following steps to be performed: i. obtain a reading P₀[i] from the step of measuring the electrical power consumption P₀ of the electronics; ii. obtain a reading P_(Cool)[i] from the step of measuring electrical power consumption P_(Cool) of the chiller; iii. compute and store the total electrical power consumption for T₀=T₀[i], namely P_(Total)[i]≡P₀[i]+P_(Cool)[i]; and as a second step, finds the value i* of index i for which P_(Total)[i*] is the minimum of the array P_(Total)[i] and to select, for subsequent operation of the closed-loop system, the corresponding coolant temperature T*₀≡T₀[i*], such that the closed loop will consume, in subsequent operation, the smallest possible amount of electrical power.
 13. A method according to claim 12, wherein the series of values of the temperature T₀ comprises elements T₀[1] and T₀[2], where T₀[1] is a lowest-permissible coolant temperature, and T₀[2] is the coolant temperature obtained when the free-cooling means provides 100% of the cooling, with the chilling means turned off or bypassed.
 14. A method according to claim 13, wherein the lowest-permissible coolant temperature T₀[1] is calculated based on an ambient dew-point temperature T_(Dew Point).
 15. A method according to claim 13, wherein the lowest-permissible coolant temperature T₀[1] is calculated as (T₀)_(min)=T_(Dew Point)+1° C. 